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II.  ABSTRACT 


^Currently,  the  structure  of  stored  data  is  determined  implicitly  by  the  software 
which  accesses  and  processes  it.  This  data  structuring  technology  has  given  rise 
to  two  outstanding  problems  in  data  processing.  First,  there  is  the  communication 
of  the  exact  structure  of  data  to  users  and  machines,  and  secondly,  the  interchange 
of  the  data  iteelf. 

This  work  contributes  to  overcoming  these  problems  by  developing  a  technique 
for  describing  the  structure  of  data  explicitly  and  independently  of  machines  and 
software.  This  aim  is  reflected  in  the  following  objectives; 

l)  v\Tb  understand  data  structures  by  developing  a  model  which  not  only 

characterizes  current  data  organizational  techniques,  but  also  provider, 
a  framework  within  which  new  data  structures  can  be  defined, 

-9)  To  use  this  model  to  develop  a  language  which  can  explicitly  describe 
the  organization  of  duta; 

3)  To  use  this  model  to  study  how  data  can  be  converted  from  one  structure 
to  another,  with  a  view  towards  developing  a  method  for  describing  data 
conversions.  Tf 

The  model  unifieB  th£  diverse  area  of  data  structures  by  including  the  record, 
file  and  storage  organizations  of  data.  Furthermore,  the  model  clearly  separates 
at  each  level  the  conceptual  part,  which  is  the  logical  structure  imposed  by  a  user, 
from  the  Implementation  pan.,  which  is  the  method  by  which  the  logical  structure  is 
encoded  au  a  binary  representation.  This  separation  leads  to  a  strelghtforM^j^j^^ 
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mapping  of  a  file  onto  storage.  From  an  analysis  of  the  state-of-the-art  in 
data  organization,  it  is  shown  that  the  model  can  express  not  only  the  data 
structures  of  current  systems,  but  also  certain  useful  generalizations 
which  might  well  be  produced  by  future  .ye  .ems . 

The  model  treats  records  as  hierarchies  of  data  items.  These  hierar¬ 
chies  are  expressed  by  production  systems  based  on  a  generalized  notion  of  attri¬ 
bute-value  pairs.  Files  are  treated  as  graphs  whose  nodes  are  records.  Hie 
connections  between  the  nodes  are  expressed  using  a  powerful  production  system 
which  generates  criteria  for  determining  when  any  two  records  are  to  be  linked. 

The  structure  of  storage  is  generalized  as  a  hierarchy  since  this  structure  is 
common  to  all  storage  media.  The  mapping  of  files  onto  storage  is  expressed  in 
terms  of  rules  for  distributing  the  records  of  the  file  within  the  slots  pro¬ 
vided  by  the  storage  structure. 

The  language,  called  Generalized  Data  Description  Language  (GDDL)  is 
a  realization  of  the  model,  and  thus  possesses  all  its  capabilities.  In  particular, 
the  language  can  describe  the  implementation  of  any  aspect  of  a  file  as  being 
dependent  on  any  other  aspect.  The  language  is  presented  in  an  appendix  in  the 
form  of  a  user's  manual. 

Data  conversion  is  studied  in  terms  of  transforming  data  in  one  structure 
to  another,  where  both  structures  are  expressed  in  the  model.  This  study  shows 
that  to  fully  specify  a  conversion  the  relationship  between  the  components  of  the 
two  structures  must  be  specified.  In  certain  cases,  such  as  the  reorganization 
of  a  file,  this  relationship  can  be  very  elaborate.  A  method  is  developed  for 
specifying  such  relationships,  and  a  corresponding  capability  is  built  into  GDDL. 
Tiius,  GDDL  has  the  ability  not  only  to  fully  describe  data  structures,  but  also 
to  specify  data  conversion. 
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CHAPTER  1  INTRODUCTION 


1.1  Background  and  Objectives 

Computer  technology  is  a  field  which  has  experienced  a  rapid  and 
uneven  evolution.  This  evolution  has  seen  computer  users  develop 
techniques  and  conventions  appropriate  only  to  their  own  needs  and  data 
processing  environments.  This  has  led  to  the  inability  of  different 
user  groups  to  communicate  information  about,  and  to  exchange  algorithms 
and  data  effectively.  The  problem  of  user  and  machine  dependent 
algorithms  has  received  considerable  attention,  resulting  in  the  develop¬ 
ment  of  widely  accepted  and  largely  machine  independent  programming 
languages  such  as  ALGOL.  However,  the  severity  of  the  problems  of 
user  and  machine  dependent  data  organization  has  only  been  realized 
comparatively  recently  ,  and  as  yet  little  has  been  done  to  alleviate 
this  situation. 

Traditionally  data  is  organized  either  by  developing  special  soft¬ 
ware  or  by  specifying  its  structure  in  existing  programming  languages, 
operating  systems  or  data  management  systems.  In  either  case,  the 
exact  data  organization  can  only  be  understood  by  analyzing  and  inter¬ 
preting  several  complex  and  interacting  programs  written  in  a  variety 
of  languages.  For  example;  to  understand  the  data  structures  produced 

* 

"It  has  been  estimated  that  the  lack  of  an  adequate  data  descrip¬ 
tion  language  is  costing  the  Department  of  Defense  alone  millions 
of  dollars  annually  because  of  the  inability  to  exchange  data 
effectively."  (Ma  19^9/  P6*  1) 
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by  a  particular  COBOL  program,  it  is  necessary  to  analyze  and  interpret 
the  following  programs: 

(i)  the  COBOL  program  itself, 

(ii)  the  COBOL  compiler,  and 

(ill)  the  data  management  system  of  the  machine  being  used. 

This  effort  is  necessary  because  the  factors  which  determine  the 
organization  of  data  are  implicit  in  the  programs  and  software  used 
to  process  and  structure  the  data.  Consequently,  such  practices  in 
data  organization  have  hampered  not  only  the  communication  of  data 
structures  but  also  the  interchange  of  the  data  itself.  When  oata  is  to 
be  interchanged,  it  is  necessary  to  know  first  whether  the  existing 
organization  is  compatible  with  the  new  software  which  is  to  use  it, 
and  secondly,  how  the  organization  can  be  converted  to  make  it  com¬ 
patible  when  this  is  not  the  case.  The  implicit  nature  of  data,  organi¬ 
zation  can  make  this  an  onerous  task. 

A  solution  to  those  problems  of  communication  and  data  inter¬ 
change  is  to  make  the  organization  of  data  explicit  and  its  understanding 
independent  of  machines  and  software  systems.  This  can  be  achieved  by 
developing  a  language  for  explicitly  specifying  data  structures  which 
is  separate  from  the  languages  used  to  process  that  data.  To  under¬ 
stand  a  data  structure,  it  is  then  only  necessary  to  interpret  a 
specification  which  is  expressly  intended  to  communicate  data  structure 
information,  rather  than  to  interpret  *  program  one  of  whose  side 
effects  1b  the  structuring  of  data. 
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Such  a  data  description  language  (ddl)  would  have  many  applica¬ 
tions.  One  important  application  is  to  provide  a  means  of  communica¬ 
ting  data  structures  among  users.  For  example,  using  a  ddl  a  creator 
of  a  data  base  can  describe  precisely  to  an  applications  programmer  the 
exact  structure  of  the  data  that  the  programmer  wants  to  use.  Just  as 
ALGOL  is  now  used  to  communicate  algorithms  so  can  a  ddl  be  used  to 
communicate  data  structures. 

Not  only  can  a  ddl  be  used  to  communicate  with  users,  but  by 
constructing  a  ddl  interpreter,  the  ddl  can  be  used  to  communicate  with 
machines.  Using  such  an  interpreter,  a  computer  could  use  the  informa¬ 
tion  contained  in  any  file  when  it  is  provided  with  a  ddl  description 
for  that  file.  Users  would  then  be  free  to  structure  their  data  in 
whatever  manner  they  deem  appropriate,  without  being  constrained  by  the 
data  structure  specification  facilities  available  in  operating  systems 
and  programming  languages.  Thus,  a  ddl  could  be  used  in  establishing 
automatically  the  structure  of  data  bases,  A  data  base  creator  would 
provide  a  ddl  description  and  his  data  to  the  interpreter  which  would 
structure  the  data  according  to  the  description. 

Furthermore,  we  could  apply  a  ddl  to  the  problem  of  mechanizing 
the  conversion  of  data  from  a  current  structure  to  a  new  structure. 

It  would  only  be  necessary  to  input  to  a  converter  the  data,  a  ddl 
description  of  its  current  structure,  a  ddl  description  of  its  new 
structure  and  a  ddl  description  of  the  relationship  between  elements 
in  onn  structure  and  the  other.  By  interpreting  these  descriptions 
the  converts-  could  output  the  data  in  its  new  structure.  Thus,  the 
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user  is  released  from  writing  special  conversion  programs.  In  this 
way  files  could  be  interfaced  across  programming  language,  operating 
system,  data  management  system  and  hardware  barriers. 

A  further  application  is  in  the  design  and  operation  of  data  and 
data  base  management  systems.  For  example,  a  ddl  can  be  used  to  create 
new  data  structures  which  can  then  be  tested  for  effective  storage  uti¬ 
lization  and  other  efficiency  considerations. 

At  this  point  we  should  make  clear  what  we  mean  by  the  term  "data 
structure".  We  use  the  term  to  refer  to  the  structure  of  data  &b  it  is 
to  appear  on  a  storage  medium,  including  both  the  conceptual  organiza¬ 
tion  imposed  by  the  user  and  the  implementation  of  this  conceptual  organi¬ 
zation.  Some  research  groups,  particularly  those  in  programming  lang¬ 
uages  (St  1967,  Ga  1970),  often  use  data  structure  to  refer  to  rot  only 
the  structure  of  data  (as  we  use  the  term)  but  also  the  access  method 
by  which  this  d£ta  is  used.  TO  these  groups  a  pushdown,  for  example, 
is  a  data  structure,  whereaB  we  would  say  that  a  pushdown  is  a  data 
structure  together  with  an  access  method  which  controls  storage  and 
retrieval  on  a  last  in  -  first  out  basie.  An  access  method  is  a  pro¬ 
gram  which  is  designed  to  store  and  retrieve  data  from  a  data  structure. 

It  follows  from  our  discussion  above  that  we  need  to  separate  out  data 
structures  from  the  programs  which  use  them,  so  we  can  describe  the 
data  structures  independently  and  explicitly.  Furthermore,  any  appro¬ 
priate  access  method  can  be  designed  once  the  data  structure  has  been 
specified. 
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With  this  background  in  mind,  we  state  three  objectives  for  this 
dissertation: 

1)  To  understand  data  structures  by  developing  a  model  which 

not  only  characterizes  current  data  organizational  techniques, 
but  also  provides  a  framework  within  which  new  data  structures 
can  be  defined. 

2)  lb  use  this  model  to  develop  a  language  which  can  explicitly 
describe  the  organization  of  data. 

3)  To  use  this  model  to  study  how  data  can  be  converted  from  one 
structure  to  another,  with  a  view  towards  developing  a  method 
for  describing  such  conversions. 

It  is  anticipated  that  data  description  languages  will  contribute 
as  much  as  programming  languages  towards  the  evolution  of  information 
processing.  Just  as  the  current  state  of  programming  languages  is  the 
accumulation  cf  many  efforts,  it  is  expected  that  much  research  and 
development  will  be  needed  to  fully  understand  the  power  and  applica¬ 
bility  of  data  description  languages.  The  development  of  the  ddl  in 
this  dissertation  is  perhaps  analogous  to  the  development  of  the  first 
programming  language.  Different  programming  languages  usually  have 
different  models  of  algorithms  on  which  they  are  based.  For  example, 
ALGOL  is  based  on  recursive  procedures  with  arithmetic  operations, 
whereas  LISP  is  based  on  the  lambda- calculus  and  string  manipulations. 
Similarly,  we  provide  our  own  model  of  data  organization  on  which  our 
data  description  language  ic  based. 
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Ihere  are  other  studies  in  progress  which  relate  to  the  design  of 
a  ddl,  specifically  the  studies  being  made  by  the  CODASYL  Storage  Struc¬ 
ture  Description  Language  Ihsk  Group  (SSDL  1970).  However,  this  group 
so  far  has  mainly  addressed  itself  to  techniques  for  mapping  records 
onto  storage,  which  is  Just  a  subset  of  the  problem  we  have  tackled  here, 
ftie  language  given  here  is  the  first  one  to  be  completely  developed  and 
specified.  In  addition,  we  are  the  first  to  study  and  propose  s.  gsn^rul 
solution  for  the  problem  of  using  data  descriptions  for  converting  data 
from  one  structure  to  another. 

1.2  The  Development  of  the  Model,  the  Design  of  the  language,  and  the 
Study  of  Conversion 

We  will  now  discuss  the  development  of  the  model  and  its  use  in 
the  design  of  the  ddl  (called  GDDL  for  Generalized  Data  Description. 
Language)  which  is  presented  in  this  report. 

Hie  development  of  data  description  from  its  first  primitive  forms 
in  machine  languages  to  its  current  forms  in  data  management  systems  has 
been  b ...  ?ed  on  ad  hoc  changes  triggered  by  user  needs  and  new  technology. 
Ibis  has  led  to  a  wide  variety  of  methods  for  describing  data,  without 
any  general  concept  or  comprehensive  model.  For  example,  COBOL  (US 

1968)  is  based  on  highly  developed  record  concepts,  whereas  L6  (Sa 

1969)  is  based  on  certain  aspects  of  list  structures,  and  in  operating 
system  design,  systems  programmers  have  built  up  a  body  of  expertise 

on  storage  structures  and  file  implementation  techniques.  However,  the 
common  concepts  underlying  these  and  other  aspects  of  data  structures 
have  not  been  extracted  and  formulated  into  a  comprehensive  model. 
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Therefore,  e  thorough  study  of  the  data  description  elements  in 
software  systems  and  programming  languages  was  undertaken,  with  a  view 
towards  extracting  those  common  elements  to  include  in  a  comprehensive 
model  of  data  structures. 

This  model  of  data  structures  is  divided  ir.to  three  largely  inde¬ 
pendent  levels,  namely,  the  record,  file  and  storage  levels,  and  each 
level  is  further  subdivided  into  a  conceptual  part  and  implementation 
part.  The  conceptual  part  is  the  logical  structure  which  is  imposed  on 
the  data.  The  implementation  part  is  the  way  in  which  this  structure 
is  to  be  represented  or  encoded.  The  components  of  this  subdivision 
of  data  structures  are  illustrated  in  Figure  1-1. 


Figure  1-1.  The  Components  of  a  Date  Structure  and  their 

Interrelationships 
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These  subdivisions  provide  a  valuable  vantage  point  for  understand¬ 
ing  data  structures.  Let  us  look  first  at  the  implications  of  the 
division  into  conceptual  and  implementation  parts. 

The  nature  of  the  conceptual  part  is  quite  distinct  from  the  imple¬ 
mentation  part,  even  though  most  systems  do  not  make  this  distinction. 

The  conceptual  part  is  the  machine- independent  structure  which  is  imposed 
on  the  data  by  the  user.  He  conceives  of  the  data  as  being  organized  in 
this  fashion,  and  this  is  the  form  in  which  his  programs  expect  to  find 
the  data.  The  implementation  part,  which  is  machine-dependent,  is  the 
’nay  in  which  the  logical  structure  is  encoded  as  a  bit  string  representa¬ 
tion  which  can  be  stored  on  a  storage  medium.  In  our  model  we  will  see 
that  specifications  which  relate  to  the  conceptual  part  have  the  nature 
of  production  systems,  whereas,  specifications  which  relats  to  the 
implementation  part  have  the  nature  of  certain  characteristics  of 
character  strings  like  length  or  character  code. 

In  addition,  this  subdivision  yields  a  valuable  insight  which  has 
not  been  noted  in  other  work.  IhlB  insight  is  basea  on  the  observation 
that  if  a  person  intends  to  organize  certain  entities  into  a  structure, 
he  may  want  that  organization  to  depend  on  any  property  of  those  enti¬ 
tles  which  are  available  to  him.  In  particular,  if  a  person  wants  to 
organize  records  into  a  file,  he  may  specify  this  organization  in  terns 
of  any  available  properties  of  those  records.  These  properties  can  In¬ 
clude  the  values  of  data  items  in  records,  the  logical  structure  of  the 
records  and  the  implementation  of  the  record  structure.  Thus  we  can 
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see  that  to  describe  file  organization  we  have  to  provide  more  than 
the  capability  of  Just  specifying  abstract  graphical  structures. 

Now  we  look  at  the  implications  of  dividing  the  model  into  record, 
file  and  storage  levels. 

Hie  concept  of  a  record  is  common  to  all  data  storage  and  retrieval 
systems,  yet  it  is  usually  overlooked  in  theoretical  studies  of  data 
structures.  The  structure  of  records  is  an  important  consideration  in 
that  it  is  the  basic  organization  of  data  items  which  is  treated  as  an 
entity  for  storage  and  retrieval.  Thus  far  a  hierarchic  organization  for 
records  has  proven  adequate,  as  it  provides  a  structure  which  is  rela¬ 
tively  easy  to  encode  and  decode  without  the  need  for  extended  scanning 
operations.  In  this  work,  therefore,  we  only  allow  hierarchic  structures 
at  the  record  level.  In  our  model  this  hierarchic  organization  is 
generalized  in  that  it  allows  for  levels  of  the  hierarchy  to  occur 
optionally  or  to  repeat  a  number  of  times.  This  conceptual  structure  of 
records  has  not  been  modelled  explicitly  before,  although  it  is 
essentially  the  logical  organization  of  records  which  is  implicit  in 
COBOL.  COBOL,  however,  is  quite  restrictive  on  the  ways  in  which  the 
implementation  of  records  may  be  specified.  In  this  work  we  allow  each 
implementation  characteristic  to  be  specified  either  directly  or 
dependent  on  other  characteristics. 

Records  are  the  elements  which  are  organized  into  files.  Biere 
is  great  flexibility  in  distributirg  the  overall  organization  of  a  set 
of  data  items  between  the  record  and  file  levels.  On  one  hand,  we  can 
specify  a  record  to  consist  of  a  single  data  item,  and,  in  effect, 
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specify  the  overall  organization  of  the  data  at  the  file  level.  In 
fact  we  can  specify  hierarchies  at  the  file  level  and  thus  all  the 
conceptual  structure  for  records  can  in  principle  he  moved  to  the  file 
level.  However,  while  the  conceptual  structure  of  the  data  might  remain 
the  same,  the  use  of  the  data  for  storage  and  retrieval  has  been  changed. 
Oi'  the  other  hand,  we  can  specify  a  record  to  be  a  complex  hierarchic 
structure  and  possibly  make  the  file  structure  simple.  Bie  distribution 
of  structure  between  the  file  and  record  levels  depends  on  the  intended 
use  of  the  data.  'Therefore,  by  distinguishing  record  structure  from 
file  structure  we  are  able  to  include  these  aspects  of  data  structures  in 
our  model. 

Our  concept  of  a  file  structure  is  more  general  than  others  be¬ 
cause,  as  previously  mentioned,  we  allow  the  specification  of  graphical 
structures  which  depend  on  data  and  record  properties.  Qhj-s  requires 
a  more  elaborate  specification  method  them  the  usual  methods  based  on 
pure  graph- theory . 

Ohe  specification  of  the  structure  and  encoding  of  records,  and 
the  specification  of  how  these  records  are  structured  and  implemented 
as  a  file  determine  a  bit  string  representation  of  the  file .  Diin  is 
the  bit  string  which  is  actually  mapped  onto  a  storage  structure. 

Our  division  of  storage  structurs  into  conceptual  and  implementa¬ 
tion  parts  is  the  key  to  both  simplifying  the  mapping  of  the  bit  string 
representation  of  a  file  onto  a  storage  structure,  and.  also  simplifying 
the  specification  of  storage  structures  by  extracting  the  structure 
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common  to  storage  cedia  independent  of  physical  considerations.  The 
conceptual  structure  of  storage  is  based  on  generalized  hierarchies 
which  aro  common  to  all  storage  media.  Hie  implementation  of  these 
hierarchies  is  based  on  encoding  characteristics  which  are  al30  inde¬ 
pendent  of  the  storage  media.  To  bind  a  storage  structure  to  a  particu¬ 
lar  medium,  ve  have  only  to  relate  the  levels  of  the  hierarchy  to  the 
actual  physical  levels  of  a  storage  medium. 

It  is  over  such  a  storage  structure  tnat  the  hit  string  representa¬ 
tion  of  a  file  is  distributed.  A  result  of  our  subdivision  of  data 
structures  has  been  to  make  the  actual  mapping  of  'lata  onto  a  storage 
medium  comparatively  straightforward.  It  is  only  necessary  to  decon- 
catenate  the  bit  string  representation  of  the  file  at  appropriate  points, 
and  insert  these  component  strings  without  disturbing  their  order  into 
the  sloes  already  provided  by  the  storage  structure. 

These  are  the  insights  and  advantages  which  are  obtained  by  sub¬ 
dividing  our  model  in  the  above  way.  From  the  study  of  data  descrip¬ 
tion  elements  in  software  systems  and  programming  languages  we  can 
ensure  that  we  at  least  included  the  data  description  capabilities  of 
every  current  cysteo  that  was  conciuered.  As  each  of  the  classes  of 
Boftware  in  the  study  includes  the  most  sophisticated  representative  of 
that  class,  it  is  likely  that  ve  have  in  fact  included  the  capabil¬ 
ities  of  all  current  systems.  From  this  model  the  requirements  for 
a  data  description  language  are  Irmediately  apparent.  This  allows 
GDDL  itself  to  be  vory  closely  related  to  the  model . 
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When  the  data  description  capability  of  the  language  had  been 
designed,  the  problem  of  using  descriptions  to  convert  data  from  one 
structure  to  another  was  studied.  Using  ddl's  for  data  conversion 
is  one  application  that  has  been  widely  suggested,  but  never  actually 
investigated.  With  our  model  of  data  structure,  we  could  study  the 
conversion  process  itself.  In  this  study  it  will  be  shown  that  addition¬ 
al  information  is  required  to  completely  describe  a  conversion.  Uhls 
additional  information  specifies  a  relationship,  which  can  be  quite 
elaborate,  between  names  in  one  description  and  names  in  the  other. 

Ho  model  this  relationship  the  concept  of  an  association  list  was 
developed.  GDDL  capabilities  for  describing  data  conversion  relation¬ 
ships  are  incorporated  directly  from  the  association  list  concept. 

1.3  Organisation  of  the  Rej>ort 

The  GDDL  language  itself  is  presented  in  Appendix  A  in  the  form 
of  a  self-contained  reference  manual.  Hie  body  of  this  report 
therefore  is  concerned  with  presenting  the  model  and  its  relationship 
to  the  language.  It  also  shows  that  GDDL  can  describe  any  data  organi¬ 
zation  that  can  be  obtained  with  current  systems.  Further,  because  the 
model  allows  generalizations  of  current  data  description  capabilities, 
GDDL  can  describe  data  organizations  that  are  beyond  these  present 
capabilities  but  might  well  be  incorporated  into  future  systems.  Hie 
generality  of  GDDL  relative  to  current  systems  is  discussed  in  terms 
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Chapter  2  presents  the  study  of  the  development  of  data  descrip¬ 
tion  in  programming  languages  and  software  systems.  She  table  at  the 
end  of  this  study  (Table  2-1)  provides  the  basis  for  showing  that  the 
models  and  thence  GDDL  include  all  current  data  structure  capabilities. 
This  study  is  quite  long  and  tho  details  are  not  essential  for  under¬ 
standing  the  remaining  chapter''.  The  reader  is  therefore  advised  to 
skip  to  Chapter  3  should  the  detail  become  too  oppressive. 

Chapters  3?  4  and  5  develop  the  record,  file  and  storage  levels 
of  the  model  respectively.  Each  chapter  shows  the  relationship  between 
the  model  and  the  GDDL  language  at  that  level.  The  material  in  these 
chapters  provides  an  excellent  way  of  visualizing  the  structure  of  GDDL 
and  its  description  capabilities. 

Chapter  6  discusses  the  ways  of  using  data  descriptions  to  convert 
data  from  one  structure  to  another.  TJie  concept  of  an  association  list 
is  introduced  and  it  is  shown  how  an  association  list  can  be  used  to 
complete  the  specification  of  data  conversion. 

Chapter  7  summarizes  the  contributions  of  this  report  and 
suggests  directions  for  future  research. 

Appendix  B  contains  examples  of  GDDL  descriptions  of  some  real- 
world  files  and  of  uata  conversion  from  one  structure  to  another.  Those 
examples  ere  chosen  to  further  demonstrate  the  ability  of  GDDL  to 
describe  current  data  organizations. 

Appendix  C  contains  a  proof  that  GDDL  can  indeed  describe  all 
the  COBOL  record  features.  COBOL  is  the  prototype  for  the  most  advanced 
record  level  data  representations.  It  is  shown  that  each  COBOL  record 
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description  clause 


can  be  expressed  in  gbdl. 


CHAPTER  2  EXISTING  DATA  SIEUCIURES  AND 
DAHV  DESCRIPTION  LANGUAGES 

2.1  Introduction 

Hie  object  of  this  chapter  is  to  provide  an  analysis  of  data 
Structures  in  contemporary  computer  software  with  a  view  towards  obtain- 
'ri//  a  comprehensive  summary  of  data  structure  character istlcs .  This 
summary  provides  the  basis  for  demonstrating  in  later  chapters  that  the 
CD.DL  is  complete. 

Hie  software  systems  covered  by  this  analysis  are: 

(i)  machine  languages, 

(ii)  early  operating  systems, 

(iii)  assembly  languages, 

(iv)  early  higher-level  programming  languages, 

(v)  current  operating  systems, 

(vi)  current  higher-level  programming  languages, 

(vii)  data  base  management  systems,  and 
(viii)  the  CODASYL  Data  Description  Language. 

The  characteristics  of  each  of  these  systems  are  analyzed  in  f 
separate  section  of  this  chapter.  The  final  section  combines  the 
results  of  these  analyses  into  a  table. 

2.2  Data  Structures  in  Machine  Languages 

In  macnine  languages,  there  are  four  ways  that  data  structure 
cliaraeteristlcs  are  specified; 
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1)  hardware  specifications  for  conventions  such  c.s  the  code  for 
representing  characters,  the  base  for  representing  numbers,  and  the  length 
of  the  smallest  addressable  unit  of  storage.  These  conventions  are 
fixed  for  a  given  computer  but  may  vary  from  machine  to  machine.  Tb  use 

a  particular  machine,  a  system  programmer  has  to  know  these  conventions. 
Hiua,  descriptions  in  the  form  of  specifications  in  manuals  are  usually 
provided. 

2)  machine  language  instructions  that  specify  the  data  type 
(e.g.,  character  or  number),  the  scale  of  numbers  (e.g.,  fixed  point  or 
floating  poirt),  and  the  precision  of  numbers  (e.g.,  single  or  double). 
These  descriptive  elements  are  implicit  in  data  manipulation  instructions 
rather  than  explicit  as  declarations.  They  are  illustrated  by  the 
following  examples. 

a)  To  specify  that  a  character  Btring  is  to  be  placed  in  the 
accumulator  of  the  computer,  the  machine  language  instruction  CAL  (Clear 
and  Add  Logical  Word)  would  be  used  instead  of  the  instruction  CIA  for 
placing  a  number  in  the  accumulator. 

b)  To  specify  that  a  floating  point  number  is  to  be  added 
to  the  accumulator,  the  instruction  FAD  (Floating  Add)  would  be  used 
instead  of  the  fixed  point  Instruction  ADD. 

c)  To  specify  double  precision  fer  addition,  the  Instruction 
DFAD  (Double  Precision  Floating  Add)  would  be  used  instead  of  the  single 
precision  instruction  ADD. 

3)  machine  language  instructions  that  specify  locations  of  data 
items.  These  descriptive  elements  are  also  implicit  in  data  manipula- 
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tlon  instructions  rather  than  explicit  ac  declarations .  For  example, 
the  STO  (Store)  instruction  both  declares  that  a  particular  location 
is  to  be  used  for  storage  and  specifies  that  a  data  item  is  to  be 
stored  i*.  that  location. 

4)  machine  language  instructions  that  specify  which  devices  are 
to  be  used  for  input  and  output,  and  how  data  would  be  organized  on  the 
device  medium.  These  descriptive  elements  are  also  implicit  in  data 
manipulation  instructions  rather  than  explicit  ac  declarations.  They 
are  illustrated  by  the  following  examples. 

a)  To  specify  that  a  particular  I/O  device  is  to  be  used  for 
output,  the  machine  language  instruction  WRS  (Write  Select)  is  used  to 
prepare  the  appropriate  channel. 

b)  To  specify  that  a  particular  block  of  data  items  is  to  be 
copied  onto  an  output  medium,  the  instruction  RCH  (Reset  and  Load  Channel) 
is  used  to  send  to  the  channel  a  channel  command  word  which  gives  the 
size  of  the  block  of  data  to  be  copied  and  its  location. 

c)  lb  specify  that  the  last  block  of  data  has  been  reached 

on  a  magnetic  tape,  hhe  instruction  WEF  (Write  End-of-File)  is  used  to 

write  an  end-of-file  gap  followed  by  a  tape  mark  on  the  tape. 

* 

Hie  characteristics  of  data  structures  provided  by  machine 
languages  can  be  grouped  into  two  categories.  One  includes  the  charac¬ 
teristics  of  individual  data  Items,  and  the  other  the  characteristics 

*  At  the  end  of  each  section  of  this  chapter  a  list  of  the  characteris¬ 
tics  of  the  system  under  discussion  will  be  presented.  Whenever  a 
new  characteristic  (not  appearing  in  previous  sections)  is  intro¬ 
duced,  it  will  be  underlined. 
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of  storage  media. 

1.  The  characteristics  of  individual  data  items  consist  of: 

(i)  the  hardware  provided  character  code, 

(ii)  length, 

(iii)  data  type: 

a)  character  string, 

b)  numbers; 

1)  binary  base, 

2)  Sign  -  radix  or  diminished  radix  complement 
(depending  on  the  hardware) , 

3)  fixed  or  floating-point  scale. 

2.  The  characteristics  of  storage  media  consist  of: 

(i)  block  size, 

(ii)  end-of-file  labels,  and 

(iii)  device  assignment. 

We  note  that  machine  instructions  are  seldom  used  or  made  available 
to  describe  explicitly  the  structuring  of  sets  of  data  items.  Such 
structures  are  created  and  maintained  by  machine  language  programs. 

2.3  Data  Structures  in  Early  Operating  Systems 

With  the  development  of  Operating  Systems  (OS's),  more  complex 
data  structures  on  storage  devices  were  provided  directly  to  the  pro¬ 
grammer.  Hiey  are  described  by  statements  of  the  OS  Job  control  lan¬ 
guage  (jcl) .  Previously,  these  file  and  storage  structures  had  to  be 
implemented  as  part  of  user-written  machine  language  programs . 


/ 
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Examples  of  such  statements  are  the  $FILE  and  $LABEL  statements 
provided  by  the  IBM  7040  JCL.  These  are  illustrated  in  Figure  2-1. 

2he  $FILE  statement  is  used  to  describe  the  characteristics  of  the 
file  structure  and  the  positioning  of  the  records  on  magnetic  tape,  the 
structure  of  the  tape's  physical  blocks  and  the  tape  unit. 

1.  The  file  structure  and  implementation  characteristics  consist 
of: 

(i)  ordering  the  records  in  their  input  sequence,  and 

(ii)  implementing  this  structure  by  sequential  storage. 

2.  The  record  positioning  characteristic;  is  the  record  to  tape 
block  ratio;  that  is,  the  number  of  records  per  tape  block. 

3.  The  storage  structure  and  implementation  characteristics  are: 

(i)  tape  naming, 

(ii)  tape  block  size, 

(iii)  labels: 

a)  header  and  trailer  labels  for  tape  reels  and  files, 

b)  count  fields  for  tape  blocks, 

(iv)  fixed  ordering  of  tape  blocks  and  labels  on  the  tape, 

(v)  fixed  occurrence  of  all  blocks  and  labels  specified, 

(vi)  repetition  of  reels  -  given  as  number  of  reels. 

4.  The  device  characteristic  is  read/write  density. 

Ihe  remaining  parameters  of  the  statement  are  used  to  describe 
buffers  and  actual  processing. 

The  $IABEL  statement  is  ueeu.  to  describe  the  information  in  a 
label.  labels  are  used  to  implement  storage  structures. 
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Figure  2-1.  IBM  7040  Data  Description  Statements 
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''Ji  Data  Structures  in  Assembly  Languages 

Assembly  languages  were  primarily  designed  to  enhance  data  handling 
and  to  a  lesser  degree,  to  provide  mnemonic  machine  instructions.  The 
data-oriented  pseudo- instructions  provided  by  assembly  languages  signifi¬ 
cantly  increase  the  variety  of  data  structures  made  directly  available 
to  the  user.  Thus,  many  complex  data  structures  that  had  previously- 
been  created  and  maintained  by  user  programs,  can  now  be  declared  explic¬ 
itly. 

In  Assembly  languages,  elements  and  statements  which  deal  with 
data  structures  are  typified  as  follows; 

1)  Symbolic  names  assigned  to  data  items.  These  names  may  be 
used  to  access  the  data  items  directly  without  referring  to  the  address 
of  the  data  items.  For  example,  in  the  IBM  7040  Macro-Assembly  Language 
MAP,  the  statement  DDA®  DEC  13  results  in  the  name  DM®  being  assigned 
to  the  location  in  which  a  decimal  number  13  is  stored. 

2)  Pseudo- instructions  that  declare  data  types.  For  example, 
in  IBM  7040  MAP,  data  items  may  be  declared  to  be  octal,  OCT;  decimal, 

DEC;  binary  coded  information,  BCI;  and  variable  field  data,  VFD.  IMs 
is  illustrated  by  the  following  examples; 

a)  To  specify  that  a  data  item  named  DM®  is  to  be  inter¬ 
preted  as  the  decimal  integer  13,  the  following  MAP  statement  is  used: 

DM®.  DEC  13 

b)  To  specify  that  a  data  item  named  ENIliY  is  to  contain  the 
character  C  in  the  first  6  bits  of  the  data  item,  the  following  MAP 


statement  is  used; 
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ENTRY  VFD  H6/C 

3)  Pseudo- instructions  that  describe  the  structure  of  data  items. 
For  example,  in  IBM  7040  MAP,  to  specify  that  a  block  of  6  consecutive 
storage  locations  are  to  be  reserved  for  storing  data  items,  the  follow¬ 
ing  statement  is  used: 

BSS  6 

4)  Pseudo- instructions  that  describe  input/output  characteristics 
of  particular  media.  For  example,  in  IBM  7040  MAP  such  statements  are 
of  the  form: 

name  FILE  option,  ...,  option 
LABEL  option,  . . . ,  option 

where  the  options  for  the  FILE  statement  and  lAu^.  statement  are  the 
same  as  the  options  for  the  IBM  7040  Job  Control  Language  $FILE  and 
$LABEL  described  in  the  previous  section. 

Thus,  the  following  characteristics  of  individual  data  items, 
sets  of  data  items  and  storage  media  are  made  accessible  to  programmers 
in  Assembly  Language. 

1.  The  characteristics  of  individual  data  items  consist  of: 

(i)  symbolic  mml.g, 

(ii)  the  hardware  provided  character  code, 

(ill)  length, 

(iv)  data  type: 

a)  character  string, 


b)  numbers; 
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1)  binary,  decimal  or  octal  base, 

2)  character  sign  for  decimal  numbers  and  radix 
and  diminished  radix  complement  for  binary 
numbers, 

3)  fixed  or  floating  point  scale, 

(v)  data  items  identified  by  position. 

2.  The  characteristics  of  sets  of  data  items  consist  of: 

(i)  fixed  order, 

(ii)  fixed  occurrence,  and 

(iii)  sets  of  data  items  identified  by  their  position 

3.  Assembly  languages  depend  on  their  underlying  operating  system 
for  storage  structure. 

2.5  Data  Structures  in  Early  Higher- Level  Programming  Languages 

In  developing  higher- level  languages  such  as  FORTRAN  and  COBOL, 
appropriate  data  structures  were  provided.  For  example,  FORTRAN,  which 
was  designed  for  scientific  computing,  provides  array  accessing  for 
handling  homogeneous  data  (i.e.,  data  of  the  same  type). 

The  data  description  statements  of  ANSI  FORUIAN  liave  four  forms: 
l)  Declaration  statements  that  describe  the  structure  of  Indi¬ 
vidual  data  items.  In  FORTRAN,  characteristics  such  as  scale  and 
precision  are  treated  as  additional  data  types.  For  example,  in  FQR'flvAN 
IV,  the  following  "type"  declarations  are  provided: 
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INTEGER  DCUBLE  PRECISION 

REAL  LOGICAL 

CCMPLEX  EXTERNAL 

where  LOGICAL  data  items  are  the  values  T  (or  TRUE)  and  P  (or  FALSE), 
and  EXTERNAL  data  Items  are  data  items  which  are  defined  externally  to 
the  FOR  IRAN  program.  To  specify  the  type  of  a  data  itu-n,  the  name  of 
the  item  is  listed  after  the  type  in  a  declaration  statement,  e.g., 

IN1EGER  OVAL,  A,  B 

2)  Hie  declaration  statement,  which  describes  the  structure  of 
sets  of  data  Items  (groups).  In  ANSI  FORTRAN,  individual  data  items  can 
be  grouped  together  in  hierarchic  structures  which  are  interpreted  by 
the  processor  as  arrays.  For  example,  the  tree  illustrated  belc"r  can  be 
interpreted  as  a  2  x  3  array: 


That  is,  the  pairs  of  data  items  <  a^a^  >,  <  *2i,022  >  and  <  a3l'a32  > 
are  interpreted  as  rows.  Arrays  are  limited  to  a  maximum  of  three 
dimensions.  The  DIMENSION  statement  is  used  to  describe  such  groupings. 
The  statement  has  the  following  format; 

DIMENSION  array  name  (n^ng),  ...,  array  name  (n^n^n^) 
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where:  array  name  is  the  name  used  to  refer  to  the  array,  and 
n^,n^,n^  are  the  number  of  elements  in  each  of  the 
dimensions  of  the  array,  allowed  in  ANSI  FORTRAN. 

For  example,  the  statement: 

DIMENSION  A(2, 3)  describes  a  2  x  3  array  called  A. 

Data  items  in  the  vectors  are  accessed  by  array  indexing. 

3)  the  FORMAT  statement  which  describes  input  and  output  data 
structures.  Ihe  statement  is  used  to  describe  data  type  and  length 
for  each  data  item  in  a  record  to  be  input  or  output.  For  example, 
in  ANSI  FORTRAN,  the  statement,  has  the  following  format: 

FORMAT  (data  item  specification,  ...,  data  Item  specification) 
where  a  data  item  specification  consists  of  two  parts;  a  data  type 
and  a  data  length  part.  Biese  types  are: 

F  real  with  no  exponent 
F  real  with  exponent 
D  real  with  double  precision  exponent 
I  integer 

L  logical  (character  string  T  or  1*’) 

A  character  string 

H  holler ith  (character  string  used  for  output  only) 

Length  is  given  as  number  of  characters  per  data  item.  For 
example,  A6  describes  a  data  item  which  is  a  string  of  6  characters. 

For  real  data  items,  in  addition  to  length,  the  number  of  digits  to  tiie 
right  of  the  decimal  point  16  specified.  For  example,  F8.L’  describes  n 
data  item  which  is  a  read  number  with  a  maximum  length  of  M  characters 
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and  which  has  2  digits  following  the  decimal  point. 

4)  Xnput/Output  statements  that  describe  the  order  of  the  data 
Items  to  be  input  or  outpi  r  and  the  device  to  be  used.  The  statements 
have  the  following  format: 

r READ  \  (device  number,  format  statement  number) 

(.WRITE J  data  name,  . ..,  data  name 

where:  device  number  refers  to  a  specific  device, 

format  statement  number  refers  to  the  format  statement 
describing  the  data  items  being  Input  or  output, 
data  name  refers  to  the  data  item  or  group  (array  values) 
being  Input  or  output. 

Thus,  the  following  characteristics  of  data  structures  are 
made  accessible  to  programmers  by  the  data  description  statements  of 
FORTRAN: 

1.  The  characteristics  of  ind.J  -idual  data  items  consist  of: 

(i)  symbolic  naming, 

(ii)  the  hardware  provided  character  code, 

(ill)  fixed  lengths  as  specified  by  the  user, 

( iv)  ck  type : 

a)  character  a?  ’ing, 

b)  number; 

1)  binary  or  decimal  base. 

2)  radix  or  diminished  radix  complement  depending 
on  hardware  for  binary  numbers,  character  sign 
or  no  sign  for  decimal  numbers, 


3)  fixed  or  floating  point  scale, 

(v)  data  items  identified  by  their  position. 

2.  The  characteristics  of  records  consist  of: 

(i)  array  accessing  (balanced  trees), 

(ii)  fixed  ordering, 

(iii)  fixed  occurrences, 

(iv)  groups  of  data  items  identified  by  their  position. 

3-  FORTRAN  depends  on  its  underlying  Operating  System  for  its 
storage  structure. 

Because  the  COBOL  language  was  designed  for  handling  large  quanti¬ 
ties  of  data,  more  importance  was  given  to  the  data  description  state¬ 
ments  of  the  language  than  in  FORTOAN.  These  statements  are  written  in 
separate  sections  of  a  COBOL  program.  The  Data  Division  is  the  section 
for  describing  the  data  items,  records,  files,  working  storage  and  pro¬ 
gram  constants.  Anot.  .r  section,  called  the  Environment  Division,  is 
for  describing  the  storage  media.  In  it,  information  concerning  file 
selection  is  given,  and  the  equipment  configuration  (tape  station,  prin¬ 
ter,  etc.)  is  described. 

l)  In  COBOL's  Data  Division  there  is  one  statement  for  describing 
the  organization  of  data  items  m  records  and  one  statement  for  describ¬ 
ing  the  organization  of  records  into  files; 

a)  Each  data  stem  or  gioup  of  data  items  that  is  to  appear 
in  a  x'ecord  is  described  by  a  statement  of  the  form  illustrated  in 
Figure  2-2.  This  statement  is  used  to  describe: 
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i)  the  level  at  which  the  data  item  or  group  of  data 
items  ia  to  occur  in  the  hierarchic  record, 
ii)  the  data  type  (e.g,,  character  string  *  DISPLAY, 
numeric  string  ■  CCMP), 
lii)  the  length  of  the  data  item, 
iv)  the  number  of  times  the  data  item  or  group  of  data 
items  is  to  occur  in  each  record, 
v)  the  alignment  of  the  data  item  in  respect  to 
word  boundaries  and  to  fixed  length  strings  of 
character  positions. 
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Figure  2-2  The  ANSI  COBOL  Statement  For  Describing 
a  Data  Item  or  a  Group  in  a  COBOL  Record 
(US  1968) 


b)  Ihe  organization  of  COBOL  records  in  a  COBOL  file  is 
described  by  a  statement  of  the  form  illustrated  in  Figure  2-3.  This 
statement  is  used  to  describe 

i)  the  size  of  storage  blocks, 

ii)  the  size  of  the  records  stored  in  the  blocks, 

iii)  any  labels  to  appear  on  the  storage  tape, 

iv)  the  names  of  records  appearing  Jn  the  file. 
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Describing  a  COBOL  File  (llO  1  '/*’) 
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2)  In  COBOL's  Environment  Division,  there  is  one  section  that 
is  used  to  describe  input  and  output  conventions.  In  it,  equipment 
assignments  and  certain  physical  characteristics  of  each  file  to  be  used 
by  the  program  are  described  by  a  statement  of  the  form  illustrated  in 
Figure  2-4.  3fcis  statement  is  used  to  describe  the  device  on  which 
the  file  is  stored. 


FILE-CONTROL 

FILE-CONTROL,  j  SELECT  [OPTIONAL]  file-name 

ASSIGN  TO  [ integer-1 ]  implementor-name-1  [, implementor-name-2] .  .  . 

IFOR  MULTIPLE  REEL]  ^.RESERVE  f'2}  ALTERNATE  jj  } 


Figure  2-4.  Hie  ANSI  COBOL  Statement  for  Describing 
the  Storage  Convention  of  a  COBOL  File 

(us  1968) 


Thus,  the  data  structures  that  are  made  accessible  to  programmers 
by  COBOL  can  be  characterized  in  the  following  way. 

1.  The  characteristics  of  individual  data  iters  consist  of: 

(i)  symbolic  naming, 

(ii)  the  hardware  provided  character  code, 

(ill)  fixed  lengthB  as  specified  by  the  user, 

(iv)  data  types: 

a)  character  string, 

b)  number: 
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1)  binary  or  decimal  base, 

2)  sign  -  radix  or  diminished  radix  complement 
(depending  on  the  hardware)  for  binary  numbers, 
and  character  sign  or  no  sign  for  decimal 
numbers , 

3)  fixed  or  floating  point  scale, 

(v)  value  alignment  (justification)  with  blank  or  zero 
padding, 

(vi)  value  string  alignment  (synchronization)  with  respect 
to  computer  words  with  blank  or  zero  padding, 

(vii)  data  items  identified  by  their  position. 

2.  The  characteristics  of  records  consist  of: 

(i)  hierarchic  structure, 

(ii)  fixed  order, 

(iii)  fixed  occurrences, 

(iv)  fixed  repetition  ordered  as  input, 

(v)  groups  of  data  items  identified  by  their  position. 

3-  COBOL  depends  on  its  underlying  Operating  System  for  its 
storage  structures. 

2.6  Data  Structures  in  Third- Generation  Operating  Systems 

In  their  current  stage  of  development,  Operating  Systems  (OS's) 
are  providing  more  file  and  ctorage  structure  options  than  curly  OS's. 
The  creation  end  maintenance  of  these  structures  are  treated  as  a 
set  of  services  separate  from  those  involved  in  scheduling  programs. 

The  part  of  an  OS  which  supports  these  services  is  referred  to  as  the 


-  32  - 


data  management  system  (DMS)  of  the  operating  system.  Among  these 
services  are  the  moving  of  data  between  storage  devices  and  main  memory, 
and  the  accessing  of  data  in  DMS  maintained  structures.  Additional  JCL 
statements,  known  as  DMS  statements,  are  provided  to  evoke  DMS  services. 

In  general,  DMS's  provide  their  users  with  a  number  of  file  and 
storage  structures.  To  store  data  in  such  structures,  the  user  proceeds 
as  follows: 

(i)  he  names  the  particular  structure  in  a  DMS  statement, 

(il)  he  lists  the  parameters  which  select  those  options  provided 
by  the  IMS  (if  any),  and 

(iii)  he  enters  his  data. 

The  data  management  service  so  evoked  moves  the  data  from  the  input 
device  to  the  appropriate  storage  devices  and  Btores  it  in  the  described 
structures. 

For  example,  the  DMS  II  of  the  RCA  SFECffiA  70/46  TSOS  (RCA  1971) 
provides  its  users  with  five  structures  and  related  input/output  con¬ 
ventions.  Collectively,  these  structures  and  conventions  are  called 
access  methods.  They  are: 

1)  PAM  (Primitive  Access  Method)  This  method  provides  only  a 
particular  record  format  (fixed  in  length)  and  storage  on  either  direct- 
access  devices  or  on  single  reel,  standard  blocked  tape.  PAM  creates 
and  accesses  files  only  in  random  order.  Bie  user  must  himself  handle 
the  blocking  and  deblocking  of  records. 
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2)  SAM  (Sequential  Access  Method).  This  method  provides  eith.r 
fixed  length,  variable  length  or  undefined  record  formats  (where  records 
with  undefined  formats  are  stored  one  to  a  block) .  SAM  creates  and 
accesses  files  in  sequential  order  only.  It  performs  all  blocking, 
deblocking  and  buffering  for  the  user. 

3)  ISAM  (Index  Sequential  Access  Method).  It  provides  either 
fixed  or  variable  length  record  formats  and  storage  on  direct-access 
devices  only.  Records  are  maintained  by  means  of  a  directory  whose 
entries  point  to  the  records  to  reflect  the  correct  sequence.  In  other- 
words,  records  may  not  be  in  sequential  order  physically.  The  field 
whose  values  determine  the  sequence  is  called  the  key.  Thus,  ISAM  can 
access  files  in  a  sequential  or  non- sequential  order.  In  terms  of 
storage  structure,  an  ISAM  file  is  made  up  of  data  blocks  (2048  bytes) 
and  directory  blocks.  Data  blocks  contain  the  user's  records  which  are 
ordered  initially  according  to  the  values  of  the  key  field.  Directory 
blocks  contain  pointers  to  data  blocks.  ISAM  performs  ill  blocking, 
deblocking  and  buffering  for  the  user. 

4)  BTAM  (Basic  Tape  Access  Method).  This  method  provides  either 
fixed  length  or  undefined  record  formats  (where  records  are  stored  one 
per  block)  and  storage  on  tape  only.  B2AM  is  used  to  provide  efficient 
accessing  of  tape  blocks. 

!,)  EAM  (Evanescent  Accecs  Method).  It  provides  fixed  length 
record  formats  and  storage  on  direct-access  devices  only.  EAM  creates 
and  accesses  temporary  files  only  in  a  random  order.  Because  they  are 
temporary,  EAM  files  have  no  labelc  and  require  no  cataloguing  or 
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security  checks. 

Data  structures  in  these  five  access  methods  are  similar  in  several 
respects.  In  fact,  only  three  structures  are  provided  for  records: 

1)  Fixed  length  -  in  which  each  record  contains  exactly  the  same 
number  of  bytes.  Standard  format  is  known  to  all  IMS  access 
methods . 

2)  Variable  length  -  in  which  each  record  may  contain  a  different 
number  of  bytes.  In  each  variable  length  record,  the  first 
two  bytes  of  the  record  contain  the  characters  "11",  and 

the  second  two  bytes  contain  the  length  of  the  record. 

3)  Undefined  -  in  which  records  are  identical  in  length  to  the 
input/output  buffers  defined  for  the  access  method. 

There  are  three  ways  of  organizing  records  into  files: 

1)  random  organization, 

2)  sequential,  and 

3)  indexed  sequential. 

For  storage,  records  may  be  blocked  and  unblocked  automatically, 
devices  may  be  tape  or  direct-access,  and  blocks  may  be  standard  (206h 
bytes)  or  nonstandard  (s  k09^  bytes).  Control  codec  such  ac  tapemarks, 
count  fields,  etc.  are  handled  automat ically  and  may  not  be  specified 
by  the  user. 

Thus,  the  following  characteristics  of  file  and  storage  structures 
are  made  accessible  to  programmer a  by  the  DMS  data  description  statements. 


-  35  - 


1.  The  characteristics  for  organizing  records  into  files  and 
implementing  the  structure  consist  of: 

(i)  structuring  records  by  input  sequence, 

(ii)  structuring  records  by  value  (key), 

(iii)  Implementing  structures  by 

a)  sequential  positioning,  and 

b)  by  pointers: 

1)  stored  in  tables  or  embedded  in  records, 

2)  given  as  absolute  address  or  relative  to  some 
origin. 

2.  The  characteristics  for  positioning  records  in  device  blocks 
consist  cf: 

(i)  the  record-to-block  ratio,  and 

(ii)  the  distribution  of  records  such  that  records  either 
are  maintained  whole  or  are  split  between  blocks. 

3.  The  characteristics  for  organizing  storage  blocks  and  imple¬ 
menting  this  structure  consist  of: 

(i)  block  naming, 

(ii)  formatting  for  the  following  supported  devices: 

magnetic  tape,  magnetic  dick,  cardc,  and  printer, 

(iii)  block  length  cpecif i cat ion  for  supported  devices, 

(iv)  labels  for  supported  devices, 

(v)  fixed  order  of  device  formats, 

(vi)  fixed  occurrences  of  device  formats, 

(/ii)  repetition  of  formats  for  tape  reels,  disk  levels. 
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cards  and  printer  pages. 

2.7  Data  Structures  In  Current  Versions  of  Higher- Level  Programming 

Languages 

Current  higher-level  programming  languages  have  been  developed  to 
take  advantage  of  the  data  management  services  provided  by  operating 
systems  and  to  satisfy  user  requirements  for  more  complex  working  struc¬ 
tures. 

For  example,  RCA  SPECTRA  70/46  ANSI  COBOL  (RCA  1969)  has  statements 
to  evoke  SAM  and  ISAM  end  their  related  data  structures. 

Hie  COBOL  Data  Division  has  been  enhanced:  new  internal  formats 
luive  been  added,  repeating  groups  can  be  ordered,  and  repetition  num¬ 
bers  con  vary  for  different  record  occurrences.  Hie  clauses  used  to 
specify  these  options  are  illustrated  in  Figure  2-5. 


USAGE  IS 


f  DISPLAY 

CCMPUmHONAL 
CCMPUmTICNAL-l 
CCMFUTATIONAL-2 
COMPUTATIONAL- 3 
INDEX 


Figure  2-5,  a.  The  COBOL  Statement  for  Declaring  Data  Types 

f [integer- 1  TO]  integer-2  TIMES 

[DEPENDING  ON  data-name-1] 

[  OCCURS  /  c  / ASCENDING  1  TC  ...  0 

>  t  [  DESCENDING  J  ®  IS  data- name -2 

[,  data- name- 3]  ...  ] 

L [INDEXED  BY  index- name -1  [,  index-name-2]  ... 

Figure  2-5,  b.  The  COBOL  Statement  for  Specifying  Repetition 
Figure  2-5.  Enhanced  COBOL  Description  Statements 


] 
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PL/ 1  is  an  example  of  a  higher- level  programming  language  that 
was  designed  to  incorporate  a  larger  number  of  record,  structures  than 
other  languages  available  at  the  time  of  its  conception,  it  provided 
array  accessing,  hierarchic  structuring  and  string  processing  for  data 
items  and  groupc  of  data  items. 

PL/l  provides  a  rich  set  of  characteristics  for  structuring  and 
implementing  data  items  (  IBM  19^5 ) : 

(i)  symbolic  naming, 

(ii)  the  hardware  provided  character  code, 

( iii)  fixed  and  varying  lengths  as  specified  by  the  user, 

(iv)  data  types; 

a)  character  string, 

b)  number: 

1)  binary  or  decimal  base, 

2)  sign  -  radix  or  diminished  radix  complement 
(depending  on  the  hardware)  for  binary  numbers,  and 
character  sign  or  '.o  sign  for  decimal  numbers, 

3)  fixed  or  floating  point  scale, 

(v)  value  alignment  with  zero  or  blank  pad  ciiaracters, 

(vi)  data  .terns  identified  by  pocition. 

These  data  descriptive  elements  are  combined  in  declaration  statements 
of  the  form: 


DEC1AHE  data  item  name  . 

’ /CHARACTER  \ 

» 

{bit  ) 

(«) 

[VARY  INC] 

PICTURE 

picture 

string 

or 
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ii)  DECIARE  data  item  name  (m.n)  f  BINARY  1 

,  ua  lOK  |  FLOAT  J  '  \  DECIMAL  J 

To  group  data  items  into  hierarchic  structures  and  structures 
accessible  by  array  indexing,  PL/l  provides  the  following  elements: 

1)  a  clause  which  is  used  to  specify  the  dimensions  for  array 
accessing.  It  has  the  form: 

(m^,  ...  ,  m^)  for  an  n  dimensional  array,  where  the 
ith  dimension  has  m^  elements.  This  clause  is  used  in  a 
DECLARE  statement: 

DECLARE  data  name  (m^,  ...  ,  mn)  ... 

2)  a  clause  which  is  used  to  describe  hierarchic  relationships 
between  data  items.  It  has  the  same  form  as  the  level  number 
clause  in  COBOL.  It  is  used  in  a  DECLARE  statement: 

DECLARE  level  number  data  item  ... 

level  number  data  item  . . . 

Such  hierarchic  structures  may  bIbo  be  accessed  by  array 
indexing. 

For  file  and  storage  structures,  FL/I  provides  statements  which 
are  used  to  invoke  the  DMS  access  me  shoes  of  its  underlying  operating 
system. 

The  characteristics  of  data  structures  that  are  made  accessible 
to  the  programmer  by  the  data  description  elements  of  many  current  hlgho 
level  languages  are  summarized  in  Section  2.10. 
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2.8  Data  Structures  in  Data  Base  Management  Systems 

Data  Base  Management  Systems  i,iv  an  outgrowth  of  Information 
Storage  and  Retrieval  (ISR)  systems.  ISR  systems  are  designed  to  manage 
large  quantities  of  a  particular  type  of  data.  For  example,  one  early 
system,  MEDLARS,  was  created  to  manage  documents  for  the  National  Library 
of  Medicine  (La  1968) . 

In  these  systems,  since  only  one  type  of  information  was  to  be 
used,  only  one  type  of  file  structure  was  required.  Also,  input  and 
output  routines  were  specialized  to  handle  the  file  structure  most 
effectively.  As  a  whole,  ISR  systems  were  individually  tailored  for 
applicat ' nns  such  as  text-handling  and  record-keeping. 

The  development  of  more  generalized  text-handling  and  record-keep¬ 
ing  systems  led  to  today's  generalized  Data  Base  Management  Systems 
(DBMS's)  (CO  1969). 

Every  DBMS  has  a  language.  The  data  description  statements  of 
the  language  specify  the  structure  of  data  maintained  by  the  DBMS.  In 
general,  the  data  description  statements  form  the  largest  part  of  a 
DBMS ' s  language . 

For  example,  in  the  MARK  IV  DBMS  developed  by  Informat ic  Inc. 

(CO  1969),  raw  date  must  be  input  in  the  format  in  which  it  is  to  be 
stored.  MARK  IV  formats  can  be  characterized  in  the  following  way. 

1.  Ihe  characteristics  of  individual  data  Items  cons. cl  of: 

(i)  symbolic  naming, 

(II)  the  hardware  provided  character  code, 

(ill)  fixed  lengths  as  specified  by  the  user, 
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(iv)  data  types: 

a)  character  string, 

b)  number: 

1)  binary  or  decimal  base, 

2)  Sign-radix  or  diminished  radix  complement 
(depending  on  the  hardware),  and 

character  signs  or  no  signs  for  cecimr-l  numbers, 

3)  fixed  or  floating  point  scale, 

(v)  data  items  identified  by  their  position. 

2.  The  characteristics  of  records  consist  of: 

(i)  hierarchic  structure, 

(ii)  fixed  order, 

(iii)  fixed  occurrences, 

(iv)  fixed  or  varying  repetitions  ordered  as  input, 

(v)  groups  of  data  items  identified  by  their  position. 

3-  MARK  IV  depends  on  its  underlying  Operating  System  for  V 
storage  structures. 

MARK  IV' s  language  is  a  tabular  language.  Forms  are  provided  in 
which  a  user  selects  options  provided  by  the  system. 

MARK  IV  is  a  self-contained  DBMS.  It  is  not  embedded  in  any 
higher- level  programming  languages.  DBMS's  which  are  embedded  in  some 
higher-level  languages  are  called  host-language  DBMS's.  They  are 
designed  to  enhance  their  host  language.  This  development  combines  the 
record  structures  provided  by  the  host  languages  with  the  file  and 
storage  structures  provided  by  the  DBMS.  COBOL  and  the  Honeywell- 
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General  Electric  Co.'s  Integrated  Data  Store  (IDS)  together  form  an 
example  of  this  type  of  system  ( CO  1969)  • 

In  COBOL- IDS,  COBOL  structures  are  usea  at  the  data  item  and 
record  level.  Diese  structures  are  described  by  the  standard  COBOL 
statements.  The  enhancement  comes  at  the  file  level.  IDS  adds  the 
capability  to  describe  network  relationships  among  records.  IDS  networks 
can  be  viewed  as  interconnecting  ring  structures.  Die  inte.” connections 
are  maintained  by  embedded  pointers.  Each  record  in  IDS  may  participate 
in  more  than  one  ring.  Thus,  a  single  record  may  be  associated  with 
many  other  records.  In  each  IDS  ring  there  is  one  record  which  is 
treated  as  a  master  record.  It  contains  control  information.  Die 
remaining  records  in  the  ring  are  called  detail  records.  Any  record 
may  be  master  in  one  ring  and  detail  in  another.  Die  data  description 
statements  used  tc  describe  the  characteristics  of  these  rings  are  in  the 
form  of  additional  clauses  in  the  COBOL  record  statement.  Each  ring 
relationship  is  defined  at  level  98  a  record  description.  In  IDS 
terminology  a  ring  Is  called  a  CHAIN.  Die  clause  for  declaring  a  record 
to  be  a  chain  master  has  the  form: 

98  c/ialn-name  CHAIN  MASTER. 

Die  clause  for  declaring  a  n  cord  to  be  a  chain  detail  has  the  form: 


98  chain-name  CHAIN  DETAIL 
[;  SELECT  UNIQUE  MASTER] 

[;  MATCH-KEY  IS  data- name] 
[•  CHAIN- ORDER  IS  SORTED] 


r  /ASCENDING "l  SORT- KEY  IS  data- name] 
*’  \DESCENDINC[ 


-  42  - 


[j  RANDOMIZE  ON  data-name] 

[;  DUPLICATED  NOT  ALLOWED]  . 

This  clause  specifies  the  chain  in  which  the  record  is  to  he  a  detail, 
the  order  in  which  detail  records  are  to  occur  (if  they  are  to  be  ordered), 
and  the  field  from  which  a  haBhed  address  of  the  record  is  to  be  derived 
(if  this  is  desired). 

MARK  IV  and  COBOL- IDS  represent  two  different  classes  of  DBMS. 
However,  they  are  both  implemented  as  application  programs  and  are  not 
parts  of  the  operating  systems.  Many  system  resources  are  thus  unavail¬ 
able  to  the  user.  Furthermore,  privacy  protection  and  access  control 
which  are  vital  to  DBMS  users  are  difficult  to  enforce.  Therefore,  a 
diffei  ent  approach  to  building  a  DBMS  vas  taken  by  the  designers  of 
the  Extended  Data  Management  Facility  (EDMF)  implemented  at  the  Moore 
School  of  Electrical  Engineering  at  the  University  of  Pennsylvania 
(Ma  1971) .  The  EDMF  vas  implemented  as  a  part  of  the  RCA  SPECOTA  70 /46 
Time  Sharing  Operating  System  ( TS06) .  Statements  of  the  EEMF  are  in 
the  form  of  either  T506  Commands,  macro- calls  which  may  be  used  by  the 
regular  applications  programmer  in  assembly  language  programs,  or  built- 
in  functions  for  the  F0R1RAN  and  COBOL  languages. 

Die  set  of  record  and  file  structures  provided  by  the  EDMF  are 
one  of  the  most  extensive  that  has  been  implemented.  EDMF  provides 
record  structures  which  are  beyond  the  COBOL  structures  ( Ho  1971)  •  It 
provides  the  following  characteristics. 
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1.  The  characteristics  of  individual  data  items  consist  jf: 

(i)  symbolic  naming, 

(ii)  the  hardware  provided  character  code, 

(iii)  fixed  or  variable  lengths  as  specified  by  the  user, 

(iv)  data  types: 

a)  character  string, 

b)  number: 


1)  decimal  or  binary  base, 

2)  sign  -  radix  or  diminished  radix  complement 
(depending  on  the  hardware)  for  binary  numbers, 
and  character  sign  or  no  sign  for  decimal 
numbers, 

(v)  value  alignment  -  left  for  character  strings  and 

right  for  numbers,  with  zero  or  blank  pad  characters, 

(vi)  data  items  identified  by  position  and  by  attribute 
names  used  as  delimiters. 


2.  The  characteristics  of  records  consist  of: 

(i)  hierarchic  structure, 

(ii)  fixed  order, 

(iii)  fixed  or  optional  occurrences  of  data  items  and  groups, 

(iv)  fixed  and  variable  repetition  of  data  items  and  groups 
ordered  as  input, 

(v)  groups  identified  by  position  and  by  using  attribute 


names  as  markers. 
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At  the  file  level,  the  EEMF  allows  records  to  be  linked  together 
Into  lists,  when  the  records  contain  the  same  data  items  (called  key¬ 
words).  A  record  may  be  linked  into  any  number  of  lists.  Pointers  to 
the  heads  of  the  lists  are  stored  in  directories  (tables)  in  ascending 
lexicographical  rrder.  By  setting  limits  on  list  lengths,  files  may 
be  implemented  completely  wish  pointers  embedded  in  records  or  with 
tables  of  pointers  or  some  combination  of  the  two.  ‘ibis  is  under  the 
user's  control,  and  allows  him  to  organize  his  data  in  a  wide  range  of 
structures,  including  inverted,  multilist,  and  indexed  random  organiza¬ 
tion  (Hs  1970) .  EDMF  seems  to  be  the  only  existing  DBMS  to  allow  the 
user  this  kind  of  control  over  the  implementation  of  his  file. 

Each  one  of  the  above  DBMS' b  was  designed  to  enhance  various 
characteristics  at  either  the  data  item,  record  or  file  levol,  or  at 
all  three,  lhe  level  and  degree  of  enhancement  vary  from  DBMS  to  DBMS. 

A  summary  will  be  provided  in  Section  2.10  of  the  most  advanced  DBMS 
f  eatures . 

2.9  The  Data  Description  Language  of  the  CCDASYL  Data  Base  Task  Group 
The  CQEftSYL*  Data  Base  Task  Group  (DBTO)  was  organized  to  unify 
work  done  on  current  DBMS  data  description  languages.  The  goal  of  the 
DBTG  is  to  produce  a  single  data  description  language  (DDL)  in  which 
all  current  data  structures  at  the  data  item,  record  and  file  levels 
can  be  described.  This  DDL  (CO  1971)  includes: 

*  COASYL- (Conference  on  Data  Systems  Languages)  is  a  group  originally 
formed  to  create  a  business-oriented  language.  It  produced  COBOL  and 
has  now  extended  its  interests  to  DBMS's. 
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1)  the  COBOL  Data  Division  which  allows  the  user  to  specify 
record  formats.  Unlike  the  ELMF,  the  CODAS YL  DDL  does  not  allow  varying 
length  data  items,  varying  repetitions,  or  optional  occurrences  of  data 
items . 

2)  statements  describing  network  structures.  The  concept  of  a 
SET  has  beer,  developed  to  describe  file  structures,  A  SET  is  a  sequen¬ 
tially  ordered  set  of  record ...  Each  SET  has  one  "owr^r"  record  and 
several  "member"  records.  The  concept  of  "owner"  record  is  similar  to 
that  of  "master"  record  in  IDS  Member  records  of  SET's  are  ordered 
in  eitner  of  two  ways: 

(a)  Records  may  be  ordered  by  ascending  or  descending 
sequences  based  on  specific  keys. 

(b)  Records  may  be  ordered  in  relation  to  existing  members 

of  the  SET  as  they  are  input.  That  is,  when  a  new  record 
is  input,  it  can  be  autooat ically  placed  as  the  last  or 
first  record  of  the  SET. 

The  SET  concept  is  similar  to  the  IDS  chain. 

3)  statements  describing  file  implementation.  The  CODASYL  DDL, 
at  the  file  level,  allows  the  user  to  specify  whether  a  SET  of  records 
is  to  be  implemented  either  with  embedded  pointers  or  with  tables  of 
pointers.  However,  these  cajujot  be  combined  as  in  the  EDMF,  and  the  user 

no  control  over  the  pointers  or  table  structure. 
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In  summary,  the  following  characteristics  of  data  structures 
are  made  available  to  the  user  by  the  CQDASYL  DDL. 

1.  Ifce  charucteristics  of  individual  data  items  consist  of: 

(i)  symbolic  naming, 

(ii)  fixed  lengths  as  specified  by  the  user, 

(iii)  data  types: 

a)  character  string, 

b)  number: 

1)  binary  or  decimal  base, 

2)  sign  -  radix  or  diminished  radix  complement 
(depending  on  hardware)  for  binary  numbers, 
and  character  sign  or  no  sign  for  decimal 
numbers, 

3)  fixed  or  floating  point  scale, 

(iv)  value  alignment  with  blank  or  zero  padding, 

(v)  data  items  identified  by  their  position. 

2.  Hie  characteristics  of  records  consist  of: 

(i)  hierarchic  structure, 

(ii)  fixed  order, 

(iii)  fixed  occurrences, 

(i\)  fixed  and  dependent  repetitions  ordered  as  input, 
(v)  groups  identified  by  their  position. 


3-  The  structure  and  Implementation  characteristics  of  files 
consist  of 

(i)  structuring  by  input  sequence, 

(ii)  structuring  by  criteria  on  keys  (values): 

a)  criteria  comparisons:  s,  =, 

-onjunctions  of  criteria, 

(:'iii)  implementation: 

a)  by  embedded  pointers, 

b)  by  tables  of  pointers. 

k.  The  CODASYL  DDL  will  depend  on  its  implementation  for  storage 
structures. 

The  CODASYL  DDL  is  an  attempt  to  create  a  common  front-end  lan¬ 
guage  for  describing  data  structures  to  DBMS's.  There  is  therefore  a 
degree  of  overlap  between  the  CODASYL  DDL  and  GDDL  developed  herein. 
Before  this  overlap  is  discussed,  it  should  be  pointed  out  again  that 
GDDL  is  designed  to  be  a  language  for  completely  describing  data 
structures  and  for  data  conversion.  Ifce  CODASYL  DDL  is  not  intended 
to  specify  data  conversion.  Furthermore,  CDDL  provides  th-  capability 
of  describing  storage  structures,  whereas  CODASYL  DDL  does  not.  At 
the  record  level,  CODASYL  DDL  is  based  on  COBOL  and  we  show  in  Appen¬ 
dix  C  that  GDDL  Isas  more  descriptive  power  than  COBOL  at  the  record 
level.  Ifcls  additioiAl  power  is  obtained  by  providing  more  general 
capabilities  for  specifying  record  implementation.  Ac  the  file 
level,  CODASYL  DDL  is  designed  to  describe  just  titouc  file  structures 
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existing  In  current  systems.  GDDL  Is  designed  to  provide  much  greater 
descriptive  power  at  the  file  level.  Ihe  power  is  provided  by  general- 
lzlng  current  file  structuring  technology  essentially  by  allowing 
the  dependency  of  file  structure  on  data  values,  record  structure,  and 
record  Implementation  to  be  described. 

2 . 10  Summary 

3Vo  trends  have  appeared  in  the  handling  of  data  by  software 
systems.  First,  the  data  structures  provided  have  become  increasingly 
elaborate,  and  secondly,  the  user  has  been  given  more  and  more  expli¬ 
cit  control  over  setting  up  the  data  structures  required. 

The  earliest  systems  provided  the  user  with  certain  structural 
options  at  the  data  item  level.  Ikese  options  were,  however,  pro¬ 
vide.'  implicitly  through  a  selection  of  machine  instructions.  Suc¬ 
cessive  systems  provided  more  capabilities  at  the  record  level,  and 
allowed  these  to  be  declared  explicitly.  It  was  first  in  operating 
systems  that  structuring  facilities  were  offered  at  the  file  level. 
Typically,  the  structures  provided  were  limited  to  a  few  options 
which  frequently  Included  sequential  and  indexed  sequential  struc¬ 
tures. 

With  the  development  of  DBMS's,  users  were  given  more  control 
over  the  implementation  and  structure  of  both  records  and  files. 
However,  they  still  have  no  control  or  even  knowledge  of  the  storage 


structures  used. 


The  ddl  presented,  herein  takes  these  two  trends  towards  their 
logical  conclusion.  First,  the  ddl  can  describe  a  more  general  class 
of  data  structures  than  that  provided  by  current  data  processing 
technology.  Secondly,  the  dell  allows  every  aspect  of  a  data  structure 
at  each  level  to  be  described  explicitly. 

Those  aspects  of  data  structures  which  have  been  identified  in 
the  preceeding  sections  have  been  summarized  in  Thble  2-1.  This 
table  is  organized  to  provide  a  convenient  means  of  evaluating  the 
ddl  and  its  underlying  model  in  later  chapters. 
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CHAPTER  3  RECORD  DESCRIPTION 


3.1  Introduction 

In  this  chapter  we  begin  our  task  of  showing  how  the  organization 
of  data  can  be  explicitly  described.  We  present  the  model  for  record 
structure  that  is  the  foundation  for  the  design  of  GDDL's  record 
description  features.  We  show  that  the  model  is  complete  for  record 
description  in  the  sense  that  record  structures  of  Thble  2-1  can  be 
described  in  the  model.  We  also  discuss  how  the  model  can  describe 
certain  generalizations  of  present  record  structures.  Uien  vre  show 
that  the  record  description  statements  of  GDDL  are  based  on  this  model. 
In  this  way  we  show  that  GDDL  is  also  complete  and  generalized  in  the 
above  senses.  We  further  demonstrate  the  completeness  of  GDDL  by  noting 
that  the  COBOL  record  description  features  are  properly  contained  in 
GDDL  and  by  providing  a  set  of  exantples  which  illustrate  the  ability 
of  GDDL  to  describe  existing  record  organizations. 

3-2  A  Model  of  Record  Structures 

We  begin  this  section  by  providing  an  intuitive  introduction  to 
the  model. 

Ihe  smallest  meaningful  piece  of  information  we  will  call  a 
"data  item".  Data  items  are  tae  components  which  are  organized  into 
re cord c. 

Conceptually,  a  data  Item  lo  a  otring  of  characters,  which  pro¬ 
vide  a  vp’-'e  for  the  data  item,  together  with  an  identification  of  the 
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type  or  class  of  information  to  which  the  value  belongs.  This  type 
or  class  of  information  we  call  the  attribute  of  the  data  item. 

When  a  data  item  is  represented  on  a  storage  medium,  there  must 
be  rules  which  determine  how  this  data  item  is  implemented  as  a  bit 
string. 

When  a  user  is  organizing  data  items  for  storage  and  retrieval 
from  a  computer  medium,  he  identifies  a  particular  level  of  organization 
which  is  to  be  stored  and  retrieved  as  a  single  unit  when  the  data  is 
being  used.  This  level  of  data  item  organization  we  call  the  record 
level.  A  convenient  way  to  cohceptualize  the  organization  of  data  items 
at  the  record  level  is  as  a  hierarchy.  It  is  certainly  the  case  that 
existing  software  systems  (e.g.,  COBOL,  MARK  IV,  IDS,  EDMF,  and  the 
CODASYL  DDL)  provided  hierarchies  for  organizing  data  items  into  records. 
The  records  are  themselves  finally  represented  on  a  storage  medium  as  a 
bit  string.  So  again  there  must  be  rules  for  specifying  how  a  particular 
organization  conceived  by  n  user  is  to  be  represented  as  a  bit  faring. 
Ihere  are  then  the  following  components  to  this  process  of  data  organi¬ 
zation: 

for  data  items: 

(1)  the  conceptual  structure  of  data  Items, 

(?)  the  encoding  of  this  structure  into  a  bit  string,  and 
(3)  the  resulting  bit  string  representation; 
for  records: 

(1)  the  conceptual  structure  of  the  records. 
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(2)  the  encoding  of  the  record  structure  into  a  bit  string,  and 

(3)  the  resulting  bit  string  representation. 

Ve  therefore  have  to  model  each  of  these  components.  The  conceptual 
structure  of  data  items  and  records  is  modelled  in  terms  of  the  idias 
of  attribute  and  value  by  generalizing  the  work  of  (Me  1987),  (Ch  1968), 
(he  1970),  and  (Hs  1971).  Hie  bit  string  is  simply  a  sequence  of  O’s  and 
I'd.  The  encoding  of  the  conceptual  structure  is  modelled  directly  in 
terms  of  characteristics  for  encoding  attributes  and  values  as  bit 
strings.  The  complete  model  will  be  presented  in  two  steps.  First  the 
model  of  data  items  will  be  described  and  then  the  model  of  records. 

3.2.1  The  Model  of  Data  Items 

3. 2. 1.1  The  Concept  of  Data  Items 

The  concept  of  a  data  item  can  be  described  in  terms  of  two  primi¬ 
tives  -  attribute  and  value,  and  a  definition  of  data  item  based  on 
these  primitives. 

Intuitively,  an  attribute  is  a  quality,  such  ac  size,  or  weight 
that  is  ascribed  tr  an  object.  For  each  attribute,  there  is  a  set  of 
measures  or  quantities,  known  as  values.  A  single  value  to  be 
associated  with  the  attribute  is  selected  from  this  set.  For  example, 
a  measure  for  the  attribute  weight  is  selected  from  the  set  of  real 
numbers . 

Definition  3-1.  A  data  item  is  an  ordered  pair  of  the  form  <  a,  v  > 


vhire  a  is  an  attribute  and  v  is  a  value. 


For  example,  the  pairs  <  name,  JONES  >,  <  age,  32  >,  <  sex,  M  >, 

<  school,  NEWTCWN  HICK  SCHOOL  >,  <  school,  UNIVERSITY  OF  PENNSYLVANIA  > 
are  data  Items. 

In  representing  a  data  item  or.  a  computer  medium  (such  as  cards, 
tape,  etc.)  both  the  attribute  and  the  value  must  be  encoded.  We  shall 
consider  the  rules  for  each  kind  of  encoding  separately. 

3* 2. 1.2  Encoding  Values 

A  value  is  encoded  if  It  :lc  transformed  into  a  bit  string  according 
to  the  following  encoding  rule.  Such  a  string  will  be  called  a  value 
string.  The  rule  for  encoding  a  value  is  simply  a  detailed  specification 
of  the  six  characteristics  listed  below: 

1.  Character  codes.  Strings  of  binary  digits  are  used  to  encode 
characters  such  as  letters,  numbers  and  punctuation  signs.  Character 
codes  have  been  standardized  to  the  extent  that  all  new  computers  use 
either  of  two  codes;  USASCII  (or  ASCII)  and  EBCDIC.  However,  it  is 
not  aufficiunt  to  be  able  to  specify  either  ASCII  or  F3JDIC  as  there 
are  other  uod.es  which  are  in  use  on  earlier  computers.  Also,  users  of 
large  data  bases  employ  what  are,  in  effect,  new  cJiaracter  codes  to 
compress  data.  "5ivs,  to  be  completely  general,  it  must  be  possible  to 
describe  any  character  code.  One  way  to  describe  a  character  code  is 
to  list  for  each  character  the  code  in  texms  of  its  bit  3tring 
representation. 

Associated  with  a  character  code  is  a  sort  order.  To  descriic 
the  sort  order,  the  characters  of  the  code  can  be  listed  in  the  sort 


order. 
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When  values  are  to  be  translated  from  one  character  code  to  a 
second  character  code,  It  Is  necessary  to  Indicate  for  every  character 
in  the  first  code  its  image  in  the  second  code.  This  can  be  specified 
by  listing  the  characters  of  the  second  code  in  the  same  sort  order  as 
the  first  code. 

An  examp1  *  of  encoding  the  characters  of  a  value  in  EBCDIC  is 
presented  below.  For  the  date  item  <  name,  JOKES  >,  we  have 

J  -  11010001 
C  -  11010110 
N  -  11010101 
E  -  11000101 
S  -  11100010 

2.  Length.  The  length  of  a  value  string  is  the  number  of  bits 
in  the  string. 

For  example,  the  value  string  of  the  attribute  name  in  the 
previous  example  may  be  specified  to  be  of  length  64  bits,  where  unused 
bits  may  be  filled  arbitrarily. 

3.  Length  Uniformity.  If  the  value  strings  for  an  attribute  are 
always  cf  uniform  length,  then  the  lengths  of  the  value  strings  can  be 
described  simply  by  giving  the  length.  However,  if  the  length  of  value 
strings  for  an  attribute  are  not  uniform,  then  either  the  length  of 
each  value  string  must  be  given  and  stored  as  a  data  item,  our  the  value 
string  must  be  delimited  by  special  characters.  ISjus,  value  strings 
may  be  specified  as  being  either  uniform  or  varying. 
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4.  Value  alignment .  When  the  lengths  of  the  value  strings  for 
an  attribute  are  to  be  uniform,  the  number  of  characters  needed  to 
represent  the  value  may  be  less  than  the  allotted  length.  In  such  cases, 
it  is  necessary  to  specify  whether  the  value  is  aligned  to  the  right 
or  to  the  left  and  to  specify  the  characters  to  be  used  to  pad  out  the 
unused  positions. 

Por  example,  coneider  the  data  item  <  name,  JONES  >.  The  value 
length  of  the  attribute  may  have  been  specified  ac  64  bits  and  the 
character  code  as  EBCDIC.  To  specify  that  the  value  is  to  be 
aligned  to  the  left  with  blank  characters  used  for  padding,  results  in 
the  following  encoding  of  the  value  JONES: 

J  -  11010001 
0  -  11011001 
N  -  11010101 
E  -  11000101 
S  -  11100010 
)6  -  01000000 

* 

$  -  Q10C0000 
-  01000000 

5.  Data  type.  Value  strings  may  be  interpreted  us  either  eJm ras¬ 
ters  or  as  numberE.  Numbers  are  either  signed  or  unsigned  strings  of 
digits.  Signs  n»y  be  denoted  by  tie  plus  or  minus,  by  rudix  complement, 
or  by  diminished  radix  complement.  Numbers  may  be  organized  either  us 
fixed  point,  or  as  floating  point  numbers  with  the  number  of  significant 
digits  and  the  length  of  the  mantissa  specified. 
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6.  Value  criteria.  Numeric  and  set- theoretic  criteria  may  be 
used  to  define  the  set  of  acceptable  values  for  a  given  attribute.  For 
example,  values  of  the  attribute  age  may  be  restricted  to  numbers  between 
21  and  65  for  a  given  set  of  data  items.  Or  values  of  the  attribute  city 
may  be  restricted  to  a  particular  set  of  city  names. 

3-2. 1.3  Encoding  Attributes 

We  have  seen  how  the  value  of  a  data  item  is  encoded.  To  encode 
the  entire  data  item  we  must  now  provide  a  way  of  identifying  the  attri¬ 
bute  to  which  that  value  belongs. 

This  can  be  achieved  in  two  ways.  The  first  way  is  to  directly 
encode  the  attribute  as  a  bit  or  character  string,  and  then  position 
this  string  relative  to  the  value.  Ibis  way  of  encoding  an  attribute 
can  be  made  to  fulfill  a  second  role.  We  saw  in  the  discussion  of 
length  uniformity  in  the  section  on  encoding  values,  that  if  a  value  is 
specified  as  having  varying  length,  then  it  must  be  delimited  by  charac¬ 
ters  which  signify  the  end  of  the  value  string.  The  attribute  encoding 
can  serve  as  such  a  delimiter  for  the  value  string.  We  will  call  the 
string  which  directly  encodes  an  attribute,  an  attribute  marker. 

The  following  characteristic  is  used  to  specify  an  attribute 
marker*. 

7.  Attribute  marker.  Attribute  markers  can  be  cither  character 
or  bit  strings  which  are  positions  directly  in  front  of  or  directly 
behind  a  value  string. 


I 
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The  second  way  in  which  the  attribute  of  a  particular  value  can 
be  identified  is  by  knowing  that  it  always  occurs  in  a  certain  position 
relative  to  other  values.  USiat  is,  if  a  set  of  data  items  are  organized 
in  such  a  way  that  the  position  of  the  value  corresponding  to  a  given 
attribute  can  be  identified,  then  the  attribute  has  been  indirectly 
encoded  by  positioning.  As  the  encoding  of  attributes  by  positioning 
depends  on  the  organization  of  sets  of  data  items,  this  way  of  encoding 
attributes  v.ill  be  discussed  in  the  next  section. 

3-2.2  The  Model  of  Records 

3- 2. 2.1  The  Conceptual  Record  Structure 

In  this  section  we  want  to  model  the  conceptual  structure  of 
records.  First,  ..ovever,  we  must  pin  down  exactly  what  we  mean  by  a 
record  itself.  Then,  we  can  go  on  to  obtain  the  structure  of  such 
records. 

In  the  data  processing  fieli,  a  user  of  COBOL  conceives  of  a 
record  differently  than  say,  a  user  of  MARK  IV.  In  the  definition  of 
records  below,  we  attempt  to  give  an  exact  formalization  of  the  notion 
of  record  which  is  independent  of  any  particular  software  system. 

Definition  3-2-  A  record  is  a  set  of  uata  items  which  are  structured 
according  to  the  following  rules: 

record  —  group 

group  -*  <  attribute,  {compound  'value }> 
compound  value  -•  compound  value,  compound  value 
compound  value  group 
compound  value  -  data  item 
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We  uee  the  symbol s  <  >  to  denote  an  ordered  set  end  the  symbols  {  }  to 
denote  an  unordered  set. 

For  example,  the  data  Items  <  name,  JONES  >,  <  age,  32  >,  and 

<  sex,  M  >  can  be  organized  into  the  following  record: 

<  person,  {<  name,  JCNES  >, 

<  age,  32  >, 

<  sex,  M  >}> 

As  another  example,  the  data  items  <  name,  JONES  >,  <  name,  MARY  >. 

<  age,  6  >,  <  name,  JOHN  >,  <  age,  10  >  can  bo  organized  into  the  record: 

<  family,  {<  name,  JONES  >, 

<  child,  {<  name,  MARY  >, 

<  age,  6  >}>, 

<  child,  {<  name,  JCHN  >, 

<  age,  10  >}>}> 

In  this  case  <  child,  {<  name,  MARY  >,  <  age,  6  >}>  arid 

<  child,  {<  name,  JCHN  >,  <  age,  10  >}>  are  groups. 

It  should  be  noted  that  a  data  item  is  simply  an  attribute-value  pair, 

whereas  a  group  is  an  attribute- compound  value  pair.  When  it  is 
necessary  to  distinguish  the  attributes  associated  with  compound  values 
from  the  attributes  associated  with  values,  we  will  refer  to  them  as 
group  attributes  and  data  item  attributes  respectively.  In  the  example 
above,  "name"  and  "age'  are  data  item  attributes  whereas  "family"  and 
"child"  are  group  attributes.  Compound  values  are  actually  groups  or 
data  items.  Hie  groups  forming  a  group  are  called  subordinate  groups. 
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We  note  that  as  a  consequence  of  the  above  definition  the  struc¬ 
ture  of  a  record  is  a  hierarchy  which  has  an  attribute  asso¬ 
ciated  with  each  part  of  the  hierarchy.  We  can  thus  abstract  a 
notion  of  record  structure  based  on  these  attributes  which  is  independent 
of  the  values.  This  is  done  in  Definition  3-3. 

Definition  3“ 3*  A  record  structure  is  a  relationship  over  data  item 

attributes  produced  according  to  the  following  structure  productions 

1.  record  structure  -•  structure 

2.  structure  -*  <  group  attribute,  {substructure^ 

3<  substructure  -•  substructure,  substructure 

4.  substructure  -*  structure 

5.  substructure  -*  data  item  attribute 

6.  substructure  -•  null 

For  example,  the  data  item  attributes  "name"  and  "age"  may  be 
related  by  structures  obtained  from  the  following  structure  productions: 
family  record  structure  -*  structure  FI 

structure  FI  -»  <  family,  {substructure  F1F1}> 
substructure  F1F1  -•  substructure  FI,  substructure  F? 
substructure  FI  -•  name 
substructure  F2  -•  null 

substructure  F2  substructure  '  ,  substructure  1*"' 
substructure  F2  -*  structure  F2 

structure  F2  -•  <  child,  {substructure  F?!lFl}> 
substructure  F21F1  -  substructure  FI,  substructure  F212 
substructu’-e  F212  -  age 
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IVo  particular  structures  of  these  attributes  are: 

(1)  <  family,  (name}> 

(11)  <  family,  {name,  <  child,  (name,  age}  >,  <  child, 

(name,  age}  >  }  > 

Note;  i)  Production  3  in  definition  3“ 3  allows  a  particular  sub¬ 
structure  to  repeat  an  arbitrary  number  of  timeB,  (e.g.,  in 
the  above  example  - 

substructure  F2  -*  substructure  .*2,  substructure  F2) . 
ii)  Production  6  allows  the  occurrence  of  a  particular  substruc¬ 
ture  to  be  optional,  (e.g.,  in  the  above  example  - 
substructure  F2  -*  null) . 

If  we  are  given  a  structure,  then  te  can  obtain  records  from  it 
simply  by  substituting  a  data  item  for  each  data  item  attribute  in  the 
structure . 

For  example ,  if  we  make  the  following  substitutions  in  the  struc¬ 
tures  above: 


<  name,  JONES  > 

for  name 

<  name,  MARY  > 

for  name 

<  age,  6  > 

for  age 

<  name,  JOHN  > 

for  name 

<  age,  10  > 

for  age 

we  obtain  t:?e  following  records; 

i)  <  family,  {v  name,  JONES  >}> 
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ii)  <  family,  {<  name,  JONES  >, 

<  child,  {<  name,  MARY  >,  <  age,  6  >}>, 

<  child,  {<  name,  JOHN  >,  <  age,  10  >}>}> 

In  a  previous  section  we  saw  hov  data  items  were  encoded.  Now 
we  must  consider  how  the  structure  of  a  record  is  encoded. 

3 £-'.2  Encoding  the  Record  itructure 

Hie  structure  of  a  record  is  a  relationship  over  the  data  item 
attributes  in  the  record  cpecified  by  structure  productions.  These 
productions  actually  produce  a  hierarchic  structure  which  has  the  data 
item  attributes  on  the  lowest  levels  and  each  higher  level  identified 
by  a  group  attribute.  Therefore,  to  encode  the  structure  of  a  record 
it  is  only  necessary  to  ensure  that  the  attribute  which  is  associated 
with  each  compound  value  can  be  identified. 

We  have  seen  that  the  attribute  of  a  data  item  can  be  identified 
by  putting  a  marker  adjacent  to  its  value,  or,  when  the  data  item 
appears  in  a  group,  the  attribute  can  be  identified  by  the  position  of 
its  value  relative  to  the  values  of  other  attributes. 

The  attribute  associated  with  a  compound  value  can  be  identified 
in  cimilar  ways.  Markers  can  be  placed  adjacent  to  the  compound  vulne 
using  the  came  "attribute  marker"  characteristic  us  before.  Alternative¬ 
ly,  the  attribute  for  a  compound  value  can  be  identified  by  the  position 
m  which  the  compound  value  ociairs  relative  to  the  compound  values  of 


other  attributes. 
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We  will  now  discuss  what  charecterletics  must  be  specified  to 
identify  an  Attribute  from  the  position  of  the  compound  value  or  value. 
For  convenience  in  this  discussion,  ve  will  Just  use  the  term  compound 
value  to  refer  to  both  compound  values  and  values. 

Hie  attribute  associated  with  a  compound  value  can  be  identified 
if  the  compound  value  occurs  in  a  particular  order  with  respect  to  the 
compound  values  of  other  attributes  in  the  same  substructure.  In  this 
case,  the  order  can  be  specified  by  listing  the  attributes  of  the 
compound  values  in  the  appropriate  order.  Further,  if  one  of  the  attri¬ 
butes  in  this  list  corresponds  to  a  substructure  which  is  optional, 
then  it  must  be  specified  that  this  attribute  may  not  appear.  Also, 
if  one  of  the  attributes  in  the  list  corresponds  to  a  substructure 
which  repeats,  then  the  number  of  repetitions  must  be  given. 

Hie  characteristics  required  to  identify  the  attribute  of  a  com¬ 
pound  value  (or  value)  from  the  position  of  the  compound  value  (or 
value)  are  given  below; 

8.  Order.  Hie  order  of  compound  values  can  be  specified  by 
listing  their  attributes  in  the  appropriate  order.  If  the  attributes 
are  allowed  to  appear  in  any  order,  then  the  encoding  must  be  done  by 
markers . 

9.  Occurrence.  Hie  occurrence  of  an  attribute  may  be  either 
mandatory  or  optional  within  a  substructure. 

10.  Repetition  number.  Hie  repetition  number  is  the  number  of 
times  an  attribute  may  occur  consecutively  in  a  substructure. 
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11.  Repetition  uniformity.  If  the  number  of  times  an  attribute 
repeats  is  always  the  same  (i .e. ,  the  repetition  of  the  attribute  is 
uniform),  then  the  repetition  number  can  be  specified  simply  by  giving 
the  number  directly  However,  if  the  repetition  of  the  attribute  is  not 
uniform,  then  either  the  repetition  number  must  be  encoded  and  stored 

as  a  data  item,  or  the  encoding  of  the  values  or  compound  values  for 
the  attribute  must  be  delimited. 

12.  Repetition  order.  When  the  same  attribute  repeats,  then  the 
encoding  of  the  values  or  compbund  values  for  it  may  either  be  stored 
directly  in  any  order  or  in  some  order  described  by  criteria  on  the 
values. 

13-  Criteria.  Numeric  and  set- theoretic  criteria  may  be  used  to 
define  the  set  of  acceptable  values  or  compound  values  for  each 
attribute. 

3.2.3  Ihe  Specification  of  the  Encoding  Characteristics 

In  the  previous  sections  we  have  seen  that  records  are  encoded  by 
specifying  certain  characteristics.  We  will  allow  each  characteristic 
to  be  specified  either: 

1)  directly  -  by  specifying  explicitly  the  characteristic,  or 

2)  indirectly  -  by  specifying  a  function  wuicb  must  ->e  computed 
to  determine  the  characteristic.  The  function  may  be  defined 
over  the  values  of  data  items  or  over  other  characteristics 
using  the  usual  arithmetic  operators. 

For  example,  the  length  characteristic  can  be  specified  directly 
as  a  number  of  bits,  or  it  can  be  specified  indirectly  us  perhaps 
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(l)  being  equal  to  the  value  of  some  particular  data  item,  or 
(ii)  being  equal  to  the  number  of  repetitions  of  some  particular 
attribute . 

3.3  Interpretation  of  Conmon  Data  Processing  Concepts  in  Terms  of  the 
Model  of  Record  Structures 

A  set  of  structure  productions  together  with  a  specification  of 
the  rules  for  er :oding  the  structures  determines  a  particular  type  of 
record,  or  record  type.  Two  records  are  of  the  same  record  type  if 
and  only  if  they  can  both  be  obtained  from  the  same  structure  produc¬ 
tions  and  they  both  have  the  same  encoding  characteristics. 

Note  that  the  term  record  is  sometimes  used  in  data  processing 
literature  to  refer  to  what  ve  call  a  record  type. 

Note  that  the  production  rules  of  Definition  3-2  make  it  possible 
to  distinguish  easily  between  a  data  item  and  a  record  consisting  of  a 
single  data  item,  even  though  the  both  contain  a  single  value.  For 
example,  <  name,  JOKES  >  is  a  data  item,  whereas  <  person,  [<  name, 
JONES  >}>  is  a  record.  This  distinction  reflects  the  fact  that  a  data 
item  in  itself  is  only  a  basic  unit  of  information  in  sane  data  organi¬ 
zation,  whereas  a  data  item  structured  as  a  record  is  in  addition  th? 
basic  unit  which  is  stored  or  retrieved  when  that  data  organization  is 
used. 

Two  groups  are  of  the  same  group  type  if  and  only  if  they  cau 
both  be  obtained  from  the  came  structure  productions  and  they  both 
have  tbo  same  encoding  characterlitice. 
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A  data  item  corresponds  to  the  intuitive  idea  of  a  field. 

Two  fields  are  of  the  same  field  type  if  and  only  if  they  both 
have  the  same  attribute  and  are  both  encoded  in  the  same  way. 

In  early  versions  of  COBOL  and  in  some  CMS's  only  one  type  of 
record  is  allowed  per  file.  In  these  systems  there  was  therefore  no 
need  to  refer  to  particular  types  of  records-,  However,  the  model  allows 
for  the  appearance  of  more  than  one  type  of  record  in  u  file.  Therefore, 
3ome  means  of  referring  to  particular  types  of  records  ;.iust  be  provided. 
Similarly,  it  will  be  useful  to  be  able  to  refer  to  particular  types 
of  groups  and  fields.  We  will  use  the  attribute  A  of  a  record  (group, 
field)  <  A,  . . .  >  to  name  the  type  of  that  record  (group,  field) .  Thus, 
a  record  <  person,  (...}>  is  of  type  person,  and  a  field  <  age,  10  >  is 
of  type  age.  To  ensure  that  this  way  of  referring  to  types  of  records 
(groups,  fields)  is  unambiguous,  we  must  make  the  following  convention: 

Within  a  file,  a  given  attribute  is  associated  with  only  one 
structure  and  only  one  set  of  encoding  characteristics. 

In  particular  this  requires; 

(l)  A  given  attribute  csn  occur  in  only  one  production  of  the 
form: 

structure  -•  <  attribute,  ( substructure }> 

(,’)  If  A  occuro  In  a  production  of  the  form: 
structure  -•  <  A,  ( substructure }> 
then  A  cannot  occur  in  the  substructure. 

We  will  see  in  Section  3.5  that  this  convention  ensures  that 
the  structure  productions  produce  only  hierarchic  organisations. 


i 

I 
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3*4  An  Application  of  the  Model  of  Record  Structures 

An  example  of  using  the  model  to  completely  encode  a  set  of  data 
items  in  a  given  structure  as  a  hit  string  is  given  below: 

Consider  the  data  items  -  <  name,  JONES  >,  <  age,  3?  >,  and 
<  sex,  M  >  and  the  structure  specified  b:  the  structure  productions: 
person  record  structure  -*  structure  PI 

structure  PI  -*  <  person,  {substructure  ?1P1}> 
substructure  P1P1  —  substructure  Pll,  substructure  P1P2 
substructure  P1P2  -•  substructure  P12,  substructure  P13 
substructure  Pll  -  nan* 
substructure  P12  -*  age 
substructure  P13  -»  sex 

Hie  following  record  is  obtained  from  these  structure  productions: 

<  person,  {<  name,  JONES  >, 

<  age,  32  >, 

<  sex,  M  >}> 

The  bit  string  representation  of  this  record  is  produced  using 
the  following  encoding  characteristics: 

(1)  Hie  character  code  for  t.ie  values  of  name,  age  and  sex  is 
EBCDIC. 

(2)  Hie  length  of  values  of  name  is  64  bits,  of  age  is  l6  bits, 
and  of  sex  is  6  bits. 

(3)  Hie  lengths  of  values  of  name,  age  and  sex  are  uniform. 

(4)  Hie  values  of  x.ame  arc  left  aligned  and  padded  with  blanks. 
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(5)  The  values  of  name,  age  and  sex  are  to  be  interpreted  as 
character  strings. 

(6)  There  are  no  restrictions  defined  by  criteria  on  the  values 
of  name,  age  and  sex. 

(7)  No  attribute  markers  are  used  with  value  strings  of  name, 
age  and  sex. 

The  structure  is  encoded  according  to  the  following  characteristics: 

(8)  The  attributes  name,  age  and  sex  appear  in  the  order  in  vhicn 
they  are  named  by  the  structure  productions . 

(9)  An  occurrence  of  each  attribute  is  mandatory. 

(10)  Each  attribute  occurs  once  in  a  structure. 

(11)  The  repetition  for  each  attribute  is  uniform. 

(12)  Since  there  may  be  only  one  occurrence  of  the  attributes  name 
age  and  sex,  the  repetition  order  criterion  does  not  apply. 

(13)  There  are  no  restrictions  defined  by  criteria  on  the  compound 
values  of  person. 

Applying  these  encoding  characteristics,  the  following  record  represents 
tion  results: 

IIOIOOOIIIOIOIIOIIOIOIOIIIOWIOIIIIOOOIOOKXXXXXXJKXXXXXXJKXXXXX) 

111100111111001011010100 

ior  every  different  set  of  data  items  which  are  substituted  in  the 
structure  obtained  from  the  above  set  of  structure  productions,  u 
different  bit  string  ie  pro* tuced  by  these  encoding  characteristic;;. 
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3-5  The  Completeness  and  Generality  of  the  Model 

To  be  complete,  the  model  must  incorporate  in  itself  all  of  the 
characteristics  of  record  structures  derived  in  Table  2-1.  This  is 
done  for  the  data  item  characteristics  as  follows: 

Symbolic  naming  appears  in  the  model  as  the  concept  of  an 
attribute . 

Hie  implementation  characteristics  for  data  items  appear  in  the 
model  directly  as  encoding  characteristics. 

The  characteristics  relating  to  the  structure  of  records  are 
incorporated  in  the  model  as  follows: 

The  structuring  characteristics  of  records  appear  in  the  model 
as  the  concept  of  record  structure. 

The  implementation  characteristics  are  incorporated  directly  as 
encoding  characteristics. 

Thus,  the  model  includes  each  of  the  record  level  characteristics 
appearing  in  Tibbie  2-1.  In  this  sense,  the  model  is  complete. 

We  further  note  that  the  structure  productions  and  the  convention 
of  Section  3-3  impose  a  partial  ordering  on  the  attributes  of  a  struc¬ 
ture.  Tftia  is  proved  as  follows; 

Theorem:  The  structure  productions  and  the  convention  of  Gcctlon  3-3 

inpose  a  partial  ordering  over  the  attributes  of  a  record  structure. 
Proof:  A  partial  ordering  is  a  relation  which  is 

1)  reflexive, 

2)  antisymmetric 

3)  transitive. 
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u.*t  uc  define  p  to  be  a  relation  over  attributes  as  follows: 
for  attributee  a  and  b,  a  Ob  if  and  only  if  a  =  b,  or  <  a,  {  ...  b  ...  }> 

is  a  structure,  where  b  may  appear  in  any  depth  of  {  ,  }  or  <  ,  >  brackets. 

We  will  now  show  p  is  a  partial  ordering. 

1)  By  definition  pis  reflexive. 

2)  Assume  that  a  pb  and  boa  and  that  a  /  b. 

This  means  <  a,  {  . ..  b  ...  ]  >  and  <b,  {  ...a  ...  }>  are  structures. 

But  by  (l)  of  the  convention,  the  attribute  b  can  only  be  associated 
with  one  substructure  which  must  therefore  be  (  ...  a  ...  }.  Thus, 

<a,  {  ...  b  ...  }>is  actually  <  a,  {  ...  <  b,  {  ...  u  ...  }  >.  This 
is  not  allowed  by  (2)  of  the  convention.  Hiue,  a  ob  and  b  pa  implies 
a  =  b.  Hence  p  is  antisymmetric. 

3)  Assume  a  pb  and  b  pc. 

If  a  *  b  and/or  b  =  c,  then  ape. 

If  <  a,  {  ...  b  ...  }  >  and  <b,  {  ...  c  ...  }>  are  structures, 
then  by  (l)  of  the  convention  <  a,  {  ...  b  ...  }  >  is  actually 
<  a,  {  . . .  <  b,  {  ...  c  ...  }  >  . . .  }  >.  Bius,  ape,  and  3  is 
transitive. 

Therefore,  3  is  a  partial  ordering. 

Mathematically,  any  hierarchy  can  be  realized  by  a  partial 
ordering  (Bi  I9W)  •  From  the  above  proof,  it  follows  that  the  structure 
productions  and  conventions  can  realise  any  hierarchic  record  structure. 

Ihe  characteristics  of  Table  2-1  are  inco  porated  in  more 
generalized  forms  in  the  model  to  allow  for  the  description  of  varia¬ 
tions  of  existing  data  structures.  This  generality  ic  provided  in 
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the  following  ways: 

1)  The  model  provides  a  more  generalized  way  to  describe  the 
order  of  data  items  and  groups.  As  we  have  seen  in  Tkvble  2-1,  current 
systems  only  provide  for  the  specification  of  fixed  ordering.  However, 
the  ordering  characteristic  of  the  model  allows  order  to  be  specified 
as  fixed  or  as  arbitrary  relative  to  the  groups.  For  example,  consider 
the  following  group  - 

<  x,  {  <  y,  {  <  z,  a  >,  <  t,  b  >  }  >, 

<  u,  c  >, 

<  v,  {  <  w,  d  >,  <  s,  e  >  }  >  )  > 
with  the  following  order  characteristic: 

The  ordering  for  the  compound  values  of  attribute  x  Is  fixed,  and 
the  ordering  for  the  compound  values  of  attributes  y  and  v 
is  arbitrary. 

This  results  in  the  following  valid  orderings  of  the  values  a,  b,  c, 
d,  e: 

abcde,  bacde,  abced,  and  baced. 

Such  variable  orderings  are  not  permitted  in  current  systems. 

2)  The  model  provides  a  more  generalized  way  to  specify  the 
encoding  characteristics  than  iB  required  to  describe  the  characteristics 
of  Table  2-1. 

In  Table  2-1,  we  saw  that  the  characteristics  length  and  repetition 
could  be  specified  as  depending  on  so me  single  other  data  item.  In  the 
model,  all  characteristics  can  be  specified  as  depending  on  other  data 
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items,  other  characteristics  and  functions  of  these.  Ibis  greatly  in¬ 
creases  the  variety  of  encodings  which  can  be  specified. 

In  these  ways,  the  model  allows  generalizations  of  current  data 
representations  at  the  record  level  to  be  specified. 

3.6  The  Relationship  Between  the  Model  and  GDDL 

GDDL  has  been  explicitly  designed  in  terms  of  the  model.  A  GDDL 
statement  consists  of  an  identifying  name  and  a  string  of  parameters. 

The  FIELD  and  GROUP  statements  are  used  to  describe  the  conceptual  j 

organization  of  data  items  and  groups.  Each  encoding  characteristic  of 

. 

data  items  and  the  structure  of  records  can  be  specified  by  one  or  more 
parameters  in  GDDL  statements.  The  parameters  and  statements  for  these 


characteristics  are  listed  in  Table  3-1  given  below: 


Value 

Characteristics 

Statements  and 
Parameters 

Remarks 

Specified  in 
Section  in 
Appendix  A 

Character 

FIELD  statement 

Code 

parameter  (ii) 

1.1 

CHAR  statement 

1.4.1 

SET  statement 

2. 1.2.1 

Length 

FIELD  statement 

U 

1.1 

parameter  (iii) 

</> 

pa-^ameter  (iv) 

w 

Q> 

Length 

FIELD  statement 

O 

O 

1.1 

Uniformity 

parameter  (v) 

w. 

O 

Value 

FIELD  statement 

»  6 

1.1 

Alignment 

parameter  (ix) 

e 

«* 

Data 

FIELD  statement 

0 

1.1 

Type 

parameter  (vi) 

0 

0 

Value 

GRUJP  statement 

i.p 

Criteria 

parameter  (iii)f 

1 

Criterion  statements 

2.1 

j 

Value 

Criteria 
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Ths  FIELD  statement  has  the  following  format: 

FIELD  (  field  name,  encoding  characteristics  ) 
l!his  corresponds  to  the  specification  of  a  field  type  in  the  following 
way.  Ihe  attribute  corresponds  to  the  field  name,  and  encoding 
characteristics  appear  directly.  Thus,  we  see  that  the  FIELD  statement 
specifies  data  items. 

The  GROUP  statement  has  the  following  format: 

GROUP  (  group  name,  ...  ;  (list),  ...  ,  (list)  ...). 

This  corresponds  to  the  specification  of  a  group  type  in  the  following 
way.  Compare  the  structure  productions  of  Definition  3-3  with  this 
format  The  production  of  the  type: 

stiucture  -*  <  attribute,  f  substructure  }  > 
corresponds  to  the  format  of  the  GROUP  statement,  with  the  attribute 
corresponding  to  the  group  name,  and  with  all  the  substructures  that 
can  be  obtained  using  the  remaining  types  of  productions  corresponding 
to  (list),  ...  ,  (list).  '.The  encoding  characteristics  for  each  sub¬ 
structure  are  Included  in  each  list.  Thus,  we  see  that  the  GROUP  state¬ 
ment  specifies  the  structure  for  groups.  To  specify  that  a  particular 
group  is  to  be  treated  as  a  record,  the  RECORD  statement  is  used  (see 
Section  1-3  I«  Appendix  A) . 

From  the  abov_  table,  we  note  that  every  characteristic  of  the 
model  is  included  in  GDDL.  Since  the  complete  sot  of  characteristics 
can  encode  the  structure  and  values  of  data  items,  GDDL  therefore  lias 
the  same  capability.  Ihic,  in  effect  completes  the  argument  that  GDDL 
can  specify  any  record  level  structures  which  can  be  described  in 
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the  model. 

3-7  Demonstrations  of  GDDL's  Completeness 

In  the  previous  section  we  showed  that  GDDL  is  complete  for  record 
description  by  showing  that  the  model  on  which  it  was  based  is  complete. 
We  now  provide  several  practical  examples  of  its  completeness . 

The  first  of  these  examples  is  a  demonstration  that  GDDL  contains 
the  COBOL  record  description  features  as  a  proper  subset.  COBOL  was 
chosen  because  it  is  the  prototype  for  almost  every  DBMS  DDL  and  for  the 
CODASYL  DDL  effort.  It  has  the  most  highly  developed  record  description 
capabilities  currently  available.  The  demonstration  is  given  in  Appendix 
C,  part  1.  In  Appendix  C,  part  2  three  examples  are  given  of  record 
characteristics  describable  in  GDDL  but  not  in  COBOL. 

The  remaining  examples  demonstrate  the  use  of  GDDL  in  describing 
real-world  records.  These  record  descriptions  are  part  of  larger 
examples  of  complete  conversions  of  data  from  one  structure  to  another. 
They  aie  given  in  Appendix  B. 


CHAPTER  4  FILE  DESCRIPTION 


4.1  Introduction 

This  ehapter  ia  devoted  to  the  study  and  description  of  organisa¬ 
tions  of  records  called  files.  Wo  develop  e  model  of  file  structures 
which  io  a  very  general  extension  of  current  concepts  of  files  as 
analysed  in  Chapter  2,  This  model  leads  to  the  technique  for  describing 
file  structures  that  is  incorporated  in  QDDL.  This  technique  is  illus¬ 
trated  in  a  series  of  examples  which  show  that  ODDL  can  describe  several 
well-known  file  structures. 

4.2  A  Model  of  File  Structures 

In  Chapter  3.  we  developed  a  model  of  records.  In  this  chapter, 
we  are  concerned  with  the  record  as  a  basic  unit  of  storage  and  retrieval. 
When  large  numbers  of  records  are  to  be  stored  and  retrieved,  a  problem 
of  efficient  utilisation  arises.  For  example,  store  time  is  conserved 
if  data  need  not  be  rearranged  each  time  a  new  record  is  stored.  And 
search  time  is  conserved  if  records  can  be  so  arranged  that  each  record 
is  stored  physically  next  to  the  record  that  is  needed  next.  Then,  when 
the  first  record  to  be  used  is  found,  succeeding  records  can  be  directly 
accessed  in  the  order  of  usage.  However,  when  access  to  two  or  more 
records  from  a  single  record  is  required,  a  sequential  ordering  of 
records  does  not  in  itself  provide  the  most  efficient  utilisation. 

A  user,  then,  should  conceive  of  the  records  as  being  conue«*tod 
together  in  same  way  by  access  paths.  These  paths  make  a  record  at 
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one  point  on  a  path  accessible  to  records  which  occur  at  points  previous 
to  it  on  the  path.  They  represent  connections  among  the  records  in 
question  that  the  user  wants  to  exploit  for  storage  and  retrieved..  We 
call  such  an  organization  of  records  the  conceptual  file  structure.  When 
this  structure  is  implemented  on  a  storage  medium,  it  must  be  represented 
in  some  way  by  a  string  of  bits. 

As  seen  in  Chapter  2,  there  are  currently  three  ways  in  which  the 
access  paths  of  a  file  structure  are  implemented.  If  there  is  to  be  an 
access  path  from  a  record  (say,  A)  to  another  record  (say,  B),  it  may  be 
implemented  by: 

(1)  sequencing  position  -  the  bit  string  representation  for  B 
is  concatenated  after  the  bit  string  representation  for  A 
(see  Figure  4-1,  a); 

(2)  embedding  pointers  in  the  records  -  a  pointer  to  B  (i.e., 

an  encoding  of  the  position  that  the  bit  string  representation 
of  B  occupies  in  the  record  sequence)  is  included  as  a  field 
in  A  (see  Figure  4-1,  b); 

(3)  arranging  pointers  in  tables  -  a  pointer  to  B  is  concatenated 
after  the  pointer  to  A  in  a  sequence  of  pointers  (called  a 
table)  which  is  maintained  separately  from  the  records  them¬ 
selves  . 

Ultimately,  a  pointer  to  B  will  give  the  physical  address  of  the 
bit  string  representation  of  B  when  it  Is  stored  on  a  storage  _aJlum. 

How  the  actual  bit  string  for  a  pointer  can  be  obtuined  is  discussed  in 


Chapter  5,  after  we  have  considered  the  organization  of  storage  media. 


bcr  bcr 


where  bcr  meaner  bit 
string  representa¬ 
tion  of 


Figure  4-1,  a.  By  Sequencing 


bsr  A  and  bsr  B  and 

pointer  pointer 


Figure  4-2,  b.  By  Embedding  Pointers 


bcr  pointer 
to  A  to  B 


Figure  4-1,  c.  By  Using  'Babies  of  Pointers 
Figure  4-1.  Implementation  of  Aoeeci  Paths 


i 


We  saw  in  Chapter  3  how  the  records  themselves  are  encoded  as  bit 
strings.  Now  we  must  consider  the  rules  for  encoding  the  file  structure 
into  a  bit  string.  If  the  file  structure  is  to  be  implemented  by 
sequencing,  the  rules  must  determine  the  sequence  in  which  the  bit  strings 
representing  the  records  occur.  If  the  file  structure  ic  to  be  imple¬ 
mented  by  pointers,  the  rules  must  determine  how  the  pointers  are  encoded 
into  bit  strings,  where  these  bit  strings  must  be  positioned  in  relation 
to  the  bit  strings  of  the  records,  and  the  sequence  in  which  the  bit 
strings  of  the  records  must  occur.  These  rules  will  then  determine  a 
bit  string  which  represents  the  file  structure. 

There  are  thus  three  components  of  this  process: 

(1)  the  conceptual  file  structure, 

(2)  the  final  bit  string,  and 

(3)  rules  for  encoding  the  conceptual  file  structure  of  records 
as  a  bit  string. 

We  therefore  have  to  model  each  of  these  components .  The  modelling  of 
the  conceptual  structure  is  influenced  by  (Co  1970) .  The  rules  for 
encoding  are  modelled  after  the  work  of  (Hs  1970) .  Die  bit  string 
is  simply  a  sequence  of  0's  and  1's. 

First,  the  conceptual  file  structure  will  be  described.  And 
secondly,  the  rules  for  encoding  the  file  structure  will  be  specified. 
h.2.1  Die  Conceptual  File  Structure 

We  noted  in  the  previous  section  that  the  file  structure  deter¬ 
mines  which  records  are  connected  by  acceoo  paths.  Ln  other  words,  It 
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determines  a  relation  (called  a  file  relation)  among  records  on  the 
basis  of  access  paths.  Consider  two  records  which  we  will  call  A  and 
b,  such  that  either 

(i)  the  bit  string  representation  of  B  is  concatenated  after 
the  bit  string  representation  of  A,  or 
(ii)  there  is  a  pointer  from  A  to  B. 

Then  we  say  that  there  is  a  direct  access  path  from  A  to  B.  Relative 
to  this  path  we  call  record  A  the  head  of  the  path  and  record  B  the  tail 
of  the  path.  This  terminology  allows  us  to  refer  to  records  connected  by 
access  paths  without  naming  the  specific  records. 

Definition  4-1.  The  file  relation  determined  by  access  paths  through  a 
set  of  records  consists  of  the  set  of  ordered  pairs  <  head  record, 
tail  record  >  for  each  direct  access  path. 

As  examples,  consider  that  we  are  given  a  set  of  records, 

S  -  fr, ,  ...  ,  r  }  where  r.  is  a  record  for  1  ^  i  <;  n. 

_L  xi  1 

(l)  The  access  paths  of  the  list  structure: 

rl  -  r2  -  r3  -  . . .  rn_1  -  rn 

give  the  relation  1^  =  {  <  r^r,,  >,  <  r,,>  r3  >,  ...  , 

<  r  , ,  r  >}  . 
n-1’  n  1 

(L‘)  The  access  paths  of  the  tree  structure: 
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/ 

«4 


rn- 2 

*  \ 


(3) 


«ive  the  relation  l2  =  {<  >,  <  >,  <  r2,r^  >, 

<  r2,r5  >,  •••  >  <  rn_2'rn  >  ^ 

The  access  paths  of  the  ring  structure: 


give  the  relation  1^  =  [<  rltr2  >,  . 

<  rl'rl+l  4  Vl'rn  >«  «  Vl  >) 

It  will  be  convenient  to  introduce  the  following  terminology: 

a)  If  the  pair  of  records  <  r^Tj  >  is  in  a  file  relation 

R,  then  we  say  that  there  is  a  path  of  length  1  from  r ^  to 
rj  for  relation  R.  Therefore,  a  direct  access  path  has  length 
one. 


b)  if  the  pair  of  records  <  r^r^  >  iE  not  in  a  file  relation 
K,  we  say  there  is  a  path  of  length  0  from  r^  to  r^  for 


relation  R. 
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c)  If  the  pairs  of  records 
<  rvr2  >,  <  r2,r3  >, 


•’  ‘  ri'rUl  < r 


i+1’ rl+2  >’  •••' 


<  r  ,r  ,  > 
n’  n+1 


are  in  a  file  relation  R,  then  we  say  that  there  is  a  path 
of  length  n  from  to  rn+^  for  relation  R. 

To  model  the  onceptual  file  structure  we  must  iiave  a  way  to 
specify  any  file  relation  that  a  user  may  require.  In  general,  there 
may  be  an  arbitrarily  large  number  of  records  that  can  be  included  in 
a  file  structure.  Therefore,  it  is  not  practical  for  a  user  to  state 
the  file  relation  extensively  by  listing  all  the  pairs  of  records. 
Instead,  he  can  specify  criteria  over  the  records  which  will  determine 
when  two  such  records  are  to  be  in  the  relation.  Thus,  for  two  records 
A  and  B,  <  A,  B  >  is  a  member  of  a  file  relation  if  and  only  if  A  and  B 
satisfy  the  criteria  for  the  relation.  Such  criteria  can  describe 
explicitly  the  conditions  which  must  be  met  for  two  records  to  be 
connected  by  a  direct  access  path. 

We  provide  below  a  set  of  production  rules  for  specifying  criteria. 

At  this  point  it  is  worth  noting  that  in  Chapter  3,  we  were  only 
concerned  with  hierarchic  organizations  and  so  simple  production  rules 
were  all  that  was  necessary  to  specify  record  structures.  However, 
to  organize  records  into  files,  a  far  wider  variety  of  organizations  Is 
required  and,  therefore,  a  more  elaborate  way  of  specifying  them  Js 


necessary. 
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Definition  4-2.  A  file  structure  is  a  file  relation  determined  by- 
criteria  obtained  from  the  following  production  system: 

Ciiterion  Production  System: 

Primitives:  attribute,  bit  string,  character  string,  characteristic, 
integer,  arithmetic  relations  (=,  £,  etc.),  arithmetic  operators 
(+,  -,  etc.),  set  membership  relation  (e) 

Rules  to  produce  the  names  of  records,  fields,  characteristics  and  paths: 
index  -*  (integer) 
record-modifier  -*  HEAD 
-  BAIL 
-•  X  integer 

attribute-form  -*  attribute 

attribute  index 

record-attribute  -*  attribute  record-modifier 
-•  attribute 

attribute-modifier  -*  attribute-form 

attribute-form  CF  attribute-modifier 
-*  attribute  record-modifier 
record- reference  —  record-attribute 

-*  record-attribute  criterion 
field-reference  -*  attribute-modifier 
characteristic-reference  -♦  characteristic 

path-reference  -  PA1H  (  re cord- reference,  record- reference,  crite¬ 
rion) 
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Rules  to  produce  set-theoretic  criterion; 
constant  -*  character  string 
-*  bit  string 

set-member  -*  field-reference 
-»  constant 

-*  set-member,  set-member 
set  -  {set-member} 

set-criterion  -*  field- reference  e  set 

-•  characteristic -reference  e  set 


Rules  to  produce  arithmetic  criterion; 
term  -*  VALUE  (  field- reference  ) 

-•  PARAMETER  (  characteristic-reference  ) 

-♦  LENGTH  (  path- reference  )  relation- symbol 

-•  constant 

arithmetic-operator  -•  * 

— ♦  _ 

—  X 


* 

< 

< 

> 


-  j 


arithmetic-expression  -»  term 

—  (arithmetic-expression)  arithmetic-operator 
(arithmetic-expression) 

arithmetic-criterion  -  arithmetic-expression  rclut ion- symbol 


arithmetic-expression 


Rules  for  quantifying  and  combining  criteria: 
quantifier  -» ALL  (  X  integer  ) 

-*  SCME  (  X  integer  ) 
criterion  -*  arithmetic- criterion 
-*  set-criterion 
-*  quantifier  (criterion) 

NOT  (cr  terion) 

“*  (criterion)  AND  (criterion) 

"*  (criterion)  OR  (criterion) 

Note;  A  quantifier  is  required  in  a  criterion  only  when  a  record- 

modifier  contains  a  string  of  the  form:  X  integer.  There  must 
be  one  quantifier  of  the  form:  ALL  or  SOME  (  X  integer  )  for 
each  unique  string  X  integer  in  the  criterion. 

This  production  system  is  used  to  specify  criteria  which  deter¬ 
mine  when  there  is  to  be  a  direct  access  path  from  one  record  to  another. 

We  imagine  a  processor  which  is  compiling  a  file  relation  over  a 
Get  of  records.  For  any  two  records  that  the  processor  pickc  up 
(potential  head  and  tail  records),  we  describe  to  the  procescor 
criteria  which  determine  whether  the  two  records  are  to  be  linked,  'flic 
criteria  can  be  over: 

(i)  the  values  of  data  items  in  the  recordc, 

(ii)  the  ctructural  properties  of  the  records, 

(iii)  the  implementation  of  the  records,  and/or 

(iv)  any  linkages  already  compiled. 
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The  first  three  factors  produce  data  and  record  dependent  file 
structures,  and  the  last  factor  produces  purely  graph-theoretic  struc¬ 
tures.  Ihese  criteria  are  expressed  as  arithmetic  and  set- theoretic 
expressions.  The  values  and  characteristics  being  tested  in  criteria 
may  occur  in  head  or  tail  records  or  in  any  number  of  distinct  records 
other  than  the  head  or  tail  records;  and  direct  access  paths  may  similarly 
exist  between  head,  tail  and  arbitrary  records.  When  criteria  are  speci¬ 
fied  for  records  other  than  head  or  tail  records,  record-modifiers  of 
the  form  X  integer  are  used  in  referring  to  these  records.  For  each 
unique  reference  of  this  kind,  there  must  be  a  quantifier  which 
indicates  whether  the  criterion  in  which  the  reference  appears  must 
hold  for  all,  or  at  least  one,  of  the  records  of  the  type  in  question. 

As  examples,  consider  the  following  set  of  records: 

S  =  [r^  =  <  person,  {<  name,  JCHN  DOE  >, 

<  soc.  sec.  no.,  073028556  >  }  >, 
rp  =  <  person,  (<  name,  JAMES  DCE  >, 

<  soc.  sec.  no.,  029110076  >, 

<  spouse,  MARY  BRCWN  >, 

<  child,  JCHN  DCE  >  )  >, 
r^  =  <  person,  [<  name,  MARY  BRCWN  >, 

<  soc.  sec.  no.,  008i»1263f  >, 

<  spouse,  JAMES  DOF.  >, 


<  child,  .JCHN  DCE  >  }  >, 


1 
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=  <  person,  {<  name,  MARK  BBOWN  >, 

<  soc.  sec.  no.,  21^325629  >, 

<  spouse,  ALICE  BROWN  >, 

<  child,  MARY  BROWN  >  }  >, 
r<_  »  <  person,  {<  name,  ALICE  JONES  >, 

<  soc.  sec.  no.,  345291102  >, 

<  spouse,  MARK  BRCWN  >, 

<  child,  MARY  BROWN  >  }  >  } 

(l)  Consider  now  a  criterion  which  orders  the  records  by  soc.  sec. 
no.  The  criterion  which  determines  when  there  is  a  direct  access  path 
from  a  HEAD  record  of  type  person  to  a  TAIL  record  of  type  person  can 
be  stated  in  English  as: 

The  value  of  soc.  sec.  no.  in  the  HEAD  record  is  less 
than  the  value  of  soc.  sec.  no.  in  the  TAIL  record; 
and  there  is  no  other  record  of  type  person  having 
a  soc.  sec.  no.  between  the  one  in  the  HEAD  record 
and  the  one  in  the  TAIL  record. 

Thi6  criterion  is  expressed  using  the  Criterion  Production  System  as: 

(soc.  sec.  no.  OF  person  HEAD  <  soc.  sec.  no.  CF  person  TAIL)  AND 
(ALL  (Xl)  (NOT  (soc.  sec.  no.  CF  person  HEAD  <  coc.  sec.  no.  CF 
person  Xl)  AND  (aoc.  sec.  no.  CF  person  XI  <  00c.  sec.  no.  (F 
person  TAIL))  ) 

This  determines  the  following  file  structure: 
r3  -  r2  “*  rl  -  r4  -  r5 
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Note:  In  this  example  -  soc.  sec.  no.  OF  person  HEAD 

soc.  sec.  no.  CF  person  BAIL 
soc.  sec.  no.  CF  person  XI 
are  all  value-references. 

(2)  Consider  a  criterion,  which  arranges  the  records  into  a 
family  tree.  The  criterion  which  determines  when  there  is  a  direct  access 
path  from  a  HEAD  record  of  type  person  to  a  TAIL  record  of  type  person 
can  be  stated  in  English  as: 

The  value  of  name  in  the  HEAL  record  equals  the  value  of  child 
in  the  TAIL  record. 

This  criterion  is  expressed  using  the  Criterion  Production  System  a~: 

name  OF  person  HEAL  =  child  OF  person  TAIL 
This  determines  the  following  file  structure: 

mark  brcwn  r4  %  Alice  jones 

\  / 

JAMES  DCE  r-  r5  MARY  BROWN 

'  / 

'l 

JOHN  L^ 

Note:  In  this  example,  name  CF  person  HEAL  and  child  OF  person  TAIL 


are  value- references. 
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(3)  Finally,  consider  a  criterion  which,  combined  with  the  cri¬ 
terion  of  example  (l),  creates  a  ring  structure.  The  criterion  which 
determines  when  there  is  a  direct  access  path  from  a  HEAD  record  of 
type  person  to  a  TAIL  record  of  type  person  can  be  stated  in  English  as: 
There  is  a  path  of  any  length  determined  by  the  criterion  of 
example  (1)  from  the  TAIL  record  to  the  HEAD  record;  and  there 
is  no  path  determined  by  the  criterion  of  example  (1)  either 
from  the  HEAD  record  or  to  the  TAIL  record. 

This  criterion  is  expressed  by  the  Criterion  Production  System  as: 
(LENGTH  (PATH  (person  TAIL,  person  HEAD,  criterion  (l)))  ^  l) 

AND  (ALL  (Xl)( (LENGTH  (PATH  (person  HEAD,  person  XI,  criterion 
(1)))  =  0)  AND  (LENGTH  (PATH  (person  XI,  person  TAIL,  cri¬ 
terion  (1)))  =0))) 

where  criterion  (l)  is  the  criterion  of  example  (l).  This,  together 
with  criterion  (1),  determines  the  following  file  structure: 

where  access  paths  of  the  form 
are  determined  by  the  criterion  of 
example  (l)  and  access  paths  of  the 
form  i'-afr  are  determined  by  the 
above  criterion. 

In  representing  a  file  on  a  computer  medium  both  the  records  and 
the  file  structure  must  be  encoded.  The  records  are  encoded  into  bit 
strings  as  discussed  in  the  previous  chapter.  The  next  section  dis¬ 
cusses  the  encoding  of  the  file  structure. 
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Vie  shall  consider  each  of  these  encoding  characteristics  separately. 

1.  Encoding  method.  The  encoding  method  is  either  sequential, 
embedded  pointer,  or  pointer  table  encoding. 

2.  Link  number.  The  link  number  is  the  number  of  tail  records 
to  which  any  head  record  is  connected  by  a  direct  access  path.  In 
encoding  embedded  pointers,  this  is  the  number  of  pointers  that  may  be 
stored  in  a  single  head  record  to  encode  a  particular  file  relation 
(or  the  maximum  such  number).  In  pointer  table  encoding,  the  link 
number  is  the  number  (or  maximum  number)  of  entries  in  the  table  for  a 
single  head  record. 

3.  Linkage  uniformity.  If  the  link  number  is  t.c  be  always  the 
same  for  each  head  record  (i.e.,  the  number  of  access  paths  starting 
from  any  record  is  uniform) ,  then  the  link  number  gives  the  actual  number 
of  those  access  paths.  Otherwise,  the  link  number  gives  an  upper  bound 
on  the  actual  number. 

Path  length.  Patn  length  g:  fee  an  upper  bound  on  the  length 
of  a  path  encoded  by  embedded  pointers.  For  example,  for  a  tref  struc¬ 
ture,  path  length  sets  the  maximum  depth  of  the  trees.  If  the  maxi mum 
is  reached  and  more  records  remain  tc  be  connected  bv  paths,  a  new 
structure  containing  these  records  is  started. 

Connection  set  number  The  connect  set  number  give.:  the 
maximum  number  of  records  that  are  connected  together  by  u«.‘  :ss  paths. 

For  example,  in  a  tree  structure  the  connection  set  number  gives  l.lie 
maximum  number  of  records  in  the  tree,  if  the  maximum  is  rou<  bed  and 
mom  ic cords  remain  to  be  connt'f  ted  by  paths,  a  new  structure  is  started. 


J  V 
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®iis  now  completes  the  characteristics  which  must  be  specified  to 
determine  the  rules  for  encoding  a  file  relation. 

As  for  the  encoding  characteristics  for  record  description,  we 
allow  each  characteristic  to  be  specified  either: 

1)  directly  -  by  specifying  explicitly  the  characteristics,  or 

2)  indirectly  -  by  specifying  a  function  which  must  be  computed 
to  determine  the  characteristic.  The  function  may  be  defined 
over  the  values  of  data  items  or  other  characteristics  using 
the  usual  arithmetic  operators. 

For  example,  the  link  number  characteristic  can  be  specified  direct¬ 
ly,  or  it  can  be  specified  indirectly  as,  perhaps,  being  equal  to  the 
value  of  some  data  item  in  the  head  record  in  question. 

We  will  now  show  how  these  characteristics  are  used  to  encode  a 
file  using  the  embedded  pointer  or  pointer  table  methods; 

(1)  When  using  either  of  these  methods  to  encode  a  file,  another 
criterion  must  be  defined  to  determine  the  sequence  of  the  records. 

(2)  When  a  file  relation  is  encoded  by  embedded  pointers,  a  field 
is  specified  at  the  record  level  to  contain  the  pointers.  This  field 
forms  a  part  of  the  actual  record  and  is  itself  encoded  in  the  usual  way. 

(3)  When  e,  file  relation  is  encoded  by  pointer  tables,  the 
encoding  of  the  table  must  also  be  described.  This  is  done  by  treating 


the  table  as  a  file  whose  records  contain  the  poxnters. 


■* 
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4.3  Applications  of  the  Model  of  File  Structures 

be  will  now  Illustrate  how  the  model  is  used  to  encode  a  file 
structure  into  a  bit  string.  We  will  take  the  file  structure  specified 
in  Example  (1)  and  encode  it  into  a  bit  string  for  each  of  the  sequen¬ 
tial,  embedded  pointer  and  pointer  table  methods. 

(l)  Sequential  encoding. 

The  characteristic  specified  below; 

1.  encoding  method  is  sequential; 

applied  to  the  file  structure  results  in  the  bit  string  illustrated  in 
Figure  4-2. 


■ 

bit  string 

bit  string 

bit  string 

bit  string 

Dit  string 

encoding 

encoding 

encoding 

encoding 

encoding 

of  r_ 

of  rn 

of  r.. 

of  r. 

of  rr 

Figure  4-?.  Bit  String  Representation  of 
File  Sequentially  Encoded 

(2)  Embedded  pointer  encoding. 

Let  us  assume  that  a  field  has  been  specified  at  the  record  level 
nut  is  to  contain  the  pointer.  Assume  this  field  is  positioned  at  the 
..nd  of  the  person  record.  Let  us  also  assume  that  the  record  sequence 
is  to  be  arbitrary.  Ihen  the  following  characteristics  applied  to  the 
file  structure  of  Example  (l)  results  in  the  bit  string  illustrated  in 
Figure  4-3. 
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1.  encoding  lc  by  embedded  pointer, 

2.  the  link  number  is  1, 

3-  the  link  number  in  uniform, 

4.  no  limits  are  put  on  path  length,  and 

5.  no  limits  are  put  on  the  number  of  records  linked  together. 

bit  string  bit  string  bit  string  bit  string 

of  e  of  d  of  e  of  b 


bit  string  bit  string  bit  string  bit  string  bit  string 

encoding  encoding  encoding  encoding  encoding 

of  r.  of  r.,  of  rc  of  r,  of  r„ 

3  15  4  2 


Figure  4-3.  Bit  String  Representation  of  File 
Encoded  by  Embedded  Pointers 


In  this  bit  string  the  order  of  the  records  is  r^>  r^,  r,.,  r^,  and 
a,  b,  c,  d,  and  e  are  the  positions  of  each  record  in  the  string.  These 
appear  in  each  record  as  pointers.  This  is  illustrated  in  Figure  4 -4. 
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Figure  4-4.  File  Linked  by  Embedded  Pointers 
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Figure  4-6.  File  Linked  by  a  Pointer  liable 

4.4  The  Completeness  and  Generality  of  the  Model 

We  will  now  show  that  the  model  of  file  structures  is  complete. 

We  will  do  this  by  showing  that  the  structuring  characteristics  at  the 
file  level  of  Table  2-1  are  properly  contained  in  the  conceptual  part 
of  the  model  and  that  the  implementation  characteristics  are  contained 
in  the  encoding  rules  of  the  model.  The  structuring  characteristics 
of  each  software  system  are  actually  arranged  to  provide  a  highly 
restricted  means  for  specifying  criteria  for  connecting  records.  The 
criterion  production  system  i'er  the  model  allows  the  criteria  specifiable 
by  each  software  system  to  be  expressed.  The  implementation  characteris¬ 
tics  of  every  software  system  as  derived  in  'I&ble  2-1  are  specified 
directly  by  the  encoding  characteristics  of  the  model.  Thus,  every 
file  that  can  be  specified  in  the  software  systems  of  Chapter  2  can  be 
specified  in  the  model,  in  this  sense,  the  model  is  complete. 
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We  will  now  show  how  the  model  is  generalised. 

The  criterion  production  system  which  is  the  basis  of  the  concep¬ 
tual  part  of  the  model  is  more  general  than  any  means  of  specifying 
criteria  by  an  existing  software  system  in  three  ways; 

(1)  it  allows  criteria  to  be  defined  on  the  actual  encoding 
characteristics,  as  well  as  on  values  or  paths, 

(2)  it  allows  criteria  to  be  defined  over  arithmetic  and  set- 
theoretic  functions  of  values,  characteristics  and  paths,  and 

(3)  it  allows  a  criterion  to  be  defined  in  terms  of  more  than 
one  criterion  where  the  criteria  are  connected  by  the  connec¬ 
tives;  AND,  OR  and  NOT. 

The  use  of  the  criterion  production  system  (cps)  to  describe  file 
structures  is  intended  to  avoid  the  deficiencies  inherent  in  certain 
other  approaches.  We  first  note  that  we  reject  the  implicit  approach, 
where  file  structures  are  given  implicitly  by  programs  used  to  access 
data  in  the  structures,  because  such  programs  tend  to  obscure  rather 
than  emphasize  the  logical  and  graphical  basis  of  the  file  structure. 

We  need  to  describe  file  structures  so  that  -hey  are  easily  understood 
by  humans  as  well  as  machines.  A  second,  and  apparently  straightforward, 
approach  would  be  to  name  explicitly  what  linkages  are  required  in  terms 
of  common  graphical  structures  such  as  trees  and  rings,  ibis  approach  is 
inadequate  for  two  reasons.  First,  there  are  many  ways  in  which  such 
linkages  are  determined.  For  example,  the  linkages  between  records  in 
a  sequential  structure  mey  be  determined  on  the  basis  of  the  values 
occurring  in  some  particular  type  of  field,  or  they  may  be  determined  by 
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a  particular  record  encoding  characteristic  such  as  the  length  of  a 
field.  It  is  inadequate  to  specify  a  type  of  file  structure  without 
specifying  what  properties  of  records  are  used  to  determine  that  struc¬ 
ture.  Secondly,  there  is  no  capability  for  describing  new  file  structures. 
A  third  approach,  which  lies  at  the  opposite  extreme  to  the  one  above, 
was  suggested  but  not  provided  in  (Co  1970).  Codd  suggests  using  an 
applied  predicate  calculus  to  define  how  linkages  are  constructed 
from  certain  primitive  relations  and  functions.  'Phis  approach  provides 
generality,  but  does  not  provide  a  description  which  is  immediately 
intelligible  to  a  user  in  terms  of  accepted  data  processing  concepts 
such  as  field,  record  and  file.  Furthermore,  the  predicate  calculus 
alone  does  not  provide  the  methods  for  encoding  the  data  and  the  means 
for  implementing  the  description. 

Our  approach  in  the  cps  is  to  provide  as  much  of  the  explicit 
information  provided  by  the  second  approach  as  possible  while  preserving 
the  generality  inherent  in  the  third  approach.  The  primitives  of  the 
cps  include  orilj  data-related  concepts  like  attribute  and  chax-acteristic 
and  arithmetic  relations  like  "greater  than"  and  "not  equal  to".  Criteria 
produced  by  the  cps  are  intelligible  and  refer  explicitly  to  relations 
over  records,  values  and  ciiaracteristics.  Ibis  can  be  seen  by  comparing 
the  English  and  cps  ways  of  expressing  criteria  in  the  examples  given 
after  Definition  k-P.  The  cps  expressions  largely  retain  the  intuitive 
concepts  of  the  English  expressions.  The  cps  itas  quantifiers  like  the 
ones  in  the  predicate  calculus,  but  these  quantifiers  apply  only  lo 
predicates  which  arc  intuitively  meaningful  criteria.  The  concept  of 


-  104  - 


variable  appears  as  a  general  name  for  a  record  to  be  used  when  all  or 
some  records  are  to  be  tested  for  a  particular  property.  It  is  not  a 
variable  in  the  algorithmic  sense.  In  summary  then,  the  cps  allows  data 
structure  terminology  to  be  used  explicitly  in  a  very  general  way  of 
describing  file  structures. 

The  implementation  methods  included  in  the  model  are  a  generaliza¬ 
tion  of  the  work  of  (Hs  1970) .  In  this  paper,  Hsiao  shows  how  different 
common  file  structures  are  special  cases  of  a  generalized  way  of  imple¬ 
menting  files  using  tables  of  pointers  (called  directories)  and  embedded 
pointers  to  create  access  paths  between  records.  The  generality  of  this 
work  is  increased  by  extending  it  to  include  the  sequencing  of  records. 

Finally,  the  model  provides  a  more  generalized  way  to  specify 
encoding  characteristics  than  is  necessary  for  describing  the  characteris¬ 
tics  of  Thble  2-1.  The  encoding  characteristics  for  file  description 
as  well  as  those  for  record  description  can  be  specified  as  depending  on 
data  items,  other  characteristics  and ’functions  of  these.  This  greatly 
increases  the  variety  of  encodings  which  can  be  specified. 

In  these  ways,  the  model  allows  generalizations  of  current  file 
structure  technology  to  be  described. 

4 . Hie  Relationship  Between  the  Model  and  GDDL 

GDDL  has  been  explicitly  designed  in  terms  of  the  model.  Uie  LINK 
and  criteria  statements  are  used  to  descrJb'  the  conceptual  file  struc¬ 
ture  and  the  LINK  and  FILE  statements  are  used  to  describe  the  implementa¬ 
tion  of  files.  The  exact  relationship  between  ODDL  and  the  model  ic 
described  by  Ifcble  4-2. 
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This  completes  the  argument  that  GDDL  can  describe  any  file 
representation  which  can  be  specified  in  the  model.  Therefore ,  GDDL  is 
complete  and  general  in  the  sense  described  above. 

4.6  Demonstrations  of  GDDL's  Completeness 

In  the  previous  sections  we  showed  that  GDDL  is  complete  for  file 
description  by  showing  that  the  model  on  which  it  was  based  is  complete. 
We  now  provide  practical  examples  of  its  completeness.  These  examples 
demonstrate  the  use  of  GDDL  in  describing  real-world  files.  The  file 
descriptions  are  part  of  larger  examples  of  complete  conversions  of 
data  from  one  structure  to  another.  These  examples  are  given  in  Appen¬ 
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CHAPTER  5  STORAGE  DESCRIPTION 


5.1  Introduction 

In  this  chapter  we  complete  our  task  of  showing  how  the  organiza¬ 
tion  of  data  can  be  explicitly  described.  We  now  present  a  final  model 
to  show  how  a  bit  string  representation  of  a  file,  produced  by  the 
models  of  Chapters  3  and  4,  is  transformed  to  its  final  physical 
representation  (e.g.,  a  sequence  of  appropriately  positioned  magnetic 
spots  on  a  tape) .  This  model  of  storage  structures  is  shown  to  be 
complete  by  demonstrating  that  it  incorporates  all  of  the  storage  level 
characteristics  derived  in  Table  2-1.  We  also  discuss  how  the  model 
includes  certain  generalizations  of  some  of  these  characteristics.  Then 
we  show  that  the  storage  level  GDDL  is  based  on  thi  ;  model.  In  this 
way  we  can  show  that  GDDL  is  also  complete  and  generalised  in  the  above 
sense.  We  further  demonstrate  the  completeness  of  GDDL  by  providing 
a  set  of  examples  which  illustrate  the  ability  of  GDDL  to  describe  data 
structures  on  particular  devices. 

5.2  A  Model  of  Storage  Structures 

[n  the  previous  chapters  we  followed  the  process  by  which  a  user 
organizes  his  data  into  records,  encodes  there  records  us  u  bit  string, 
then  sets  up  access  paths  between  records  and  finally  encodes  these 
access  paths  to  produce  a  single  bit  string  representing  hie  data. 

Now  the  user  must  specify  how  this  bit  string  is  to  be  distributed  over 
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the  physical  storage  medium,  taking  into  account  the  physical  constraints 
of  the  medium  (e.g.,  track  size,  tape  blocks)  with  a  view  to  obtaining 
efficient  usage.  Normally  a  user  wants  to  position  the  bit  strings 
corresponding  to  his  records  relati/e  to  the  physical  boundaries  of  a 
mediun  (e.g.,  tape  gaps,  beginning  of  trucks).  These  bounds, ries  of  a 
medium  can  occur  at  several  levels.  For  example,  in  the  case  of  a 
magnetic  disk,  the  levels  from  lowest  to  highest  are:  blocks  on  a  track, 
tracks  on  a  cylinder,  cylinders  in  a  disk  pack,  one  disk  pack  from  the 
number  available.  These  boundaries  can  be  organized  in  a  hierarchy, 
with  each  level  in  the  hierarchy  split  up  into  several  lo.:cr  levels. 

We  will  call  the  unit  of  storage  at  the  lowest  level  of  a  device  (e.g., 
blocks  for  a  disk  track)  a  basic  block;  and  the  unit  of  storage  at  each 
of  the  higher  levels  a  block.  Thus,  blocks  can  contain  other  blocks 
and/or  basic  blocks.  We  call  this  structuring  of  the  blocks  the  storage 
structure. 

Once  a  user  has  specified  the  storage  structure,  he  must  specify 
how  the  blocks  are  to  be  encoded.  This  is  done  by  specifying  encoding 
cha:a  .eristics  such  as  length  of  basic  blocks,  and  the  labels  that  are 
used  to  indicate  the  beginning  and  end  of  blocks. 

Once  the  storage  structure  and  its  encoding  have  been  specified, 

the  user  muct  specify  how  he  wants  his  rec  ~ds  to  be  positioned  relative 

% 

* 

to  the  basic  blocks  in  the  storage  structure,  lie  may  specify,  for 
example,  whether  there  is  to  be  a  maximum  number  of  records  which  are 
to  be  positioned  within  a  block;  whether  records  must  be  maintained 
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whole,  or  can  be  split  across  blocks;  whether  any  record  or  only  certain 
record  types  car  be  positioned  first  in  a  block.  He  must  also  specify 
how  any  pointers  contained  in  the  file  are  to  be  interpreted  in  terms 
of  such  characteristics  as  addressing  schemes  and  mode.  The  rules  i ox- 
positioning  records  within  blocks  and  the  punter  characteristics  deter¬ 
mine  the  addresses  which  are  encoded  into  bit  strings  and  then  used  as 
pointers  to  encode  the  structure  of  the  file  (see  Chapter  h) .  There 
must  be  a  mechanized  process  which,  using  a  description  of  the  file 
structure  and  of  the  record  structures,  and  the  record  positioning  rules 
and  addressing  schemes,  obtains  the  actual  bit  strings  for  the  pointers. 

This  specification  determines  the  bit  string  representation  of 
the  file  in  its  storage  structure. 

To  encode  this  bit  string  onto  a  particular  storage  medium,  the 
user  must  specify  medium  dependent  characteristics,  such  as  to  which 
physical  levels  of  the  medium  the  blocks  correspond  and  the  actual 
physical  encoding  (e.g.,  tape  density). 

We  can  identify  five  parts  in  this  process: 

(1)  The  specification  of  the  storage  structure; 

(2)  The  encoding  of  the  storage  structure; 

(3)  Rules  for  fitting  records  into  blocks  and  implementing 
pointers; 

(li)  The  resulting  bit  string  representatio  >  of  the  file  in  Its 
storage  structure;  arid 

(9)  Medium  dependent  encoding  of  this  hit  string. 
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We  will  model  here  the  first  four  parts  only.  The  transformation 
to  the  fifth  part  fvco  the  fourth  part  Is  quite  straightforward,  though 
hardware  dependent,  and  will  he  discussed  separately  In  Section  5 .6. 

We  model  the  storage  structure  by  a  set  of  production  rules  for 
specifying  the  structure  of  the  blocks  In  terms  of  basic  blocks  and 
block  names.  Ihe  encoding  rules  of  the  storage  structure  are  modened 
directly  by  giving  the  characteristics  which  must  be  specified  to  encode 
basic  blocks  and  block  names.  Die  rules  for  positioning  records  and 
implementing  pointers  are  also  modened  directly  by  giving  the  necessary 
characteristics. 

The  model  win  be  presented  by  first  giving  the  storage  structure 
part,  then  the  encoding  characteristics,  and  finally  the  rules  for 
positioning  records  and  implementing  pointers. 

5.2.1  Bie  Conceptual  Structure  of  Storage 

The  conceptual  structure  of  storage  win  be  described  in  terms  or 
storage  cell,  basic  block  and  block  r  e.  A  definition  of  storage  item 
based  on  these  concepts  is  given. 

A  storage  cen  is  the  basic  unit  vritten  or  read  by  a  storage 
device  as  a  single  character.  For  example,  on  tape,  this  is  either 
a  7-  or  9-bit  column. 

Intuitively,  a  basic  block  is  a  string  of  consecutive  storage 
cellc  which  are  separated  from  other  storage  cells  by  physical  delimiters 
(o.g.,  tape  gaps).  When  several  basic  blocks  are  to  be  processed  In 
the  same  way  (i.e.,  they  have  the  same  length  constraints  and  contain 
records  positioned  in  similar  ways),  each  of  these  basic  blocks  is 
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assigned  the  same  block  name. 


Definition  5-1*  A  storage  item  is  an  ordered  pair  <r  n,  b  >  where  n  is  a 
block  name  and  b  is  a  basic  block. 

Tor  example,  the  pairs: 

<  basic  block  A,  tape  block  1  > 

<  basic  block  A,  tape  blc  k  2  > 

<  basic  block  E,  tape  block  3  > 
are  storage  items. 

-ft 

Definition  5-2.  A  structured  set  of  storage  items  (sssi)  is  any  set 
of  storage  items  which  are  structured  according  to  the  following 
rules: 

sssi  -♦  <  block  name,  {block}  > 
block  -*  block,  block 
block  -*  sssi 
block  -  storage  item 


* 

Throughout  this  discussion,  it  will  be  useful  to  note  the  analogy 
between  the  conceptual  organization  of  sssi's  and  the  conceptual 
organization  of  groups  (Chapter  3)*  Both  sssi's  and  groups  are 
given  a  hierarchic  organization,  so  one  should  expect  a  strong 
correspondence  between  how  they  are  modelled.  The  correspondence 
here  is  as  follows: 

block  name  s  attribute 
basic  block  3  value 

block  ::  compound  value 
storage  item  =  data  item 
sssi  -  group. 
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For  example ,  the  following  sssi  represents  a  magnetic  tape  con¬ 
taining  n  physical  blocks  of  the  same  kind: 

<  tape  file  X,  [<  basic  block  A,  tape  block  1  >, 

<  basic  block  A,  tape  block  2  >, 


<  basic  block  A,  tape  block  n  >  }  > 

This  sssi  represents  the  physically  formatted  tape  illustrated 
in  Figure  5-1- 


tape  tile  X 

Figure  5-1*  Formatted  Tape 


We  now  abstract  a  notion  of  storage  structure  for  sets  of  storage 
items  based  solely  on  block  name. 

Definition  5-3.  A  storage  structure  for  a  set  of  storage  items  is  a 
relationship,  over  the  block  names  of  the  storage  items,  which 
can  be  produced  by  the  following  block  productions: 

1.  storage  structure  -»  structure 

2.  structure  -•  <  sbsI  block  name,  { substructure }> 

3-  substructure  -*  substructure,  substructure 

h  .  substructure  -*  structure 

5 .  substructure  —  storage  item  block  name 
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6.  substructure  null 

For  example,  the  block  names  block  I,  block  J,  block  K,  and  block 
L  may  be  related  by  structures  obtained  from  the  following  block  pro¬ 
ductions  ; 

storage  structure  -*  disk  structure 

disk  structure  -*  <  disk  file  Y,  {substructure  Yl}> 
substructure  Y1  -♦  substructure  Yl,  substructure  Y1 
substructure  Yl  -*  cylinder  structure 
cylinder  structure  -  <  cylinder,  {ssYlYl}  > 
sflYlYl  -  ssYll,  s  SY1YYY2 
ssY1YYY2  -  ssYl2,  ssY1YY2 
SSY1YY2  -  ssY12,  ssY1Y2 
ssY1Y2  -  ssY12,  ssY13 
ssY13  -  ssY13,  ssY13 
ssYll  -*  track  structure  Yll 
ssY12  -»  track  structure  Y12 
ssY13  -  track  structure  Y13 
track  structure  Yll  -*  <  track  A,  {ssYllYl}  > 

track  structure  Y12  -  <  track  B,  {cgY12Y1}  > 

track  structure  Y13  <  track  C,  {ssY13Yl}  > 

ssYllYl  -  ssYllYl,  ssYllYl 
csYUYl  block  I 
SSY12Y1  -  block  J 
CSY13Y1  -  ssYljYl,  3DY13Y1 
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esYlJfl  •*  block  K 
S8Y13Y1  -»  block  L 

One  particular  structure  of  block  names  is: 

<  disk  file  Y, 

(  <  cylinder, 

[  <  track  A,  (block  I,  block  I,  ...  ,  block  1}  >, 

<  track  B,  (block  Jj  >, 

<  track  B,  (block  J}  >, 

<  track  B,  (block  J}  >, 

<  track  C,  (block  K,  block  K,  ...  ,  block  L}  >, 

<  track  C,  (block  L,  block  K,  ...  ,  block  L}  >, 

<  track  C,  (block  L,  block  L,  ...  ,  block  K}  >, 

<  track  C,  (block  K,  block  L,  ...  ,  block  K}  >, 

<  track  C,  (block  L,  block  K,  ...  ,  block  l}  >, 

<  track  C,  (block  L,  block  K,  ...  ,  block  L}  >}>}> 

Replacing  block  names  block  I,  block  J,  block  K,  and  block  L  with  stor¬ 
age  items,  we  obtain  the  sssi  for  a  disk  file  illustrated  in  Figure  5-2. 

Tnis  completes  the  discussion  of  the  modelling  of  the  conceptual 
structure  of  storage.  We  will  now  discuss  the  storage  item  and  storage 
structure  encoding. 


i 
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5.2.2  Encoding  Storage  Items  and  Storage  Structure 

Hiis  section  is  analogous  to  those  sections  in  Chapter  3  on  the 
encoding  of  values,  attributes  and  structure.  We  will  therefore 
abbreviate  this  section  by  simply  listing  the  characteristics  which 
liave  to  be  specified.  Storage  structure  is  encoded  bv  encoding  block 
names,  in  the  same  way  that  record  structure  is  encoded  by  encoding 
attributes.  Hie  following  table,  liable  5-1,  indicates  the  characteris¬ 
tics  required  to  encode  basic  blocks  and  block  names. 


basic  block 
block  name 

Ihble  5-1-  Characteristics  required  for 

Encoding 

1.  Length.  Hie  length  of  a  basic  block  is  the  number  of  unit 
storage  cells  that  It  contains  (e.g.,  a  column  of  7  or  9  bits  on  a  tape). 

2.  Length  Uniformity.  If  the  basic  blocks  corresponding  to  a 
particular  block  name  are  always  of  uniform  length,  then  the  length  can 
be  described  simply  by  giving  the  number  directly.  However,  if  the 
length  of  the  basic  blocks  corresponding  to  a  particular  block  name 
are  nos  uniform,  then  the  length  is  specified  as  varying. 

3.  Labels.  Labels  at  the  beginning  and  end  of  a  basic  block 
or  block  provide  one  way  to  encode  tne  block  name.  Labels  can  be 
described  simply  as  a  character  or  bit  string. 


12  3^567 

X  X 


X  X  X  X  X 
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4.  Order.  The  order  of  block  names  can  be  specified  by  listing 
them  in  the  appropriate  order.  This  order  can  be  used  to  identify  the 
block  name  of  the  basic  block  or  block  being  processed.  This  provides 
another  way  to  encode  block  names. 

rj .  Occurrence.  Ihe  occurrence  of  particular  kinds  of  blocks 
or  basic  blocks  may  be  mandatory  or  optional  within  a  substructure. 

o.  Repetition  number.  The  repetition  number  is  the  number  of 
times  a  block  name  may  occur  consecutively  in  a  substructure . 

7.  Repetition  uniformity.  If  the  number  of  times  a  block  name 
repeats  is  always  the  same  (i.e.,  the  repetition  of  the  block  name  is 
uniform),  then  the  repetition  number  :an  be  specified  simply  by  giving 
the  number  directly.  However,  if  the  repetition  of  the  block  name  is 
not  uniform,  then  either  the  repetition  number  must  be  encoded  and 
snored  in  a  label,  or  the  storage  items  or  sssi's  containing  the  bloc}, 
name  must  be  delimited  by  labels. 

This  now  completes  the  characteristics  which  must  be  specified 
to  determine  the  rules  for  encoding  storage  items  and  storage  structure. 

As  for  the  encoding  c  laracteristics  for  record  and  IMle  descrip¬ 
tions,  we  allow  each  characteristic  to  be  specified  either: 

l)  directly  -  by  specifying  explicitly  the  characteristic,  or 
;■)  indirectly  -  by  ipcelfying  a  function  which  roust  he  comjniteil 
to  determine  tbs  characteristic.  The  function  muy  be  ileiiuol 
over  the  values  of  data  items  or  other  characteristics  using 


the  usual  arithmetic  operators. 
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For  example,  the  length  of  a  baeic  block  can  be  specified  directly, 
or  it  can  be  specified  indirectly  as,  perhaps,  being  equal  to  the  length 
of  another  kind  of  basic  block. 

5.2.3  Record  Positioning  and  Pointer  Interpretation  Rules 

We  have  now  seen  how  storage  is  structured  and  encoded.  The 
storage  structure  is  the  framework  within  which  the  user  distributes  the 
bit  string  representation  of  his  file.  The  user  must  specify  character¬ 
istics  which  determine  how  his  records  are  positioned  relative  to  the 
basic  blocks,  and  how  the  storage  addresses  of  the  records  are  to  be 
determined  to  produce  the  pointers  for  his  file. 

■Die  user  specifies  how  his  records  are  to  be  positioned  relative 
to  the  basic  blocks  in  terms  of  characteristics  which  apply  to  all 
records  (e.g.,  the  number  of  records  per  block)  and  characteristics  which 
apply  to  particular  record  types  (e.g.,  whether  or  not  a  record  of  a 
given  type  can  be  split  across  a  block  boundary;  whether  or  not  a  record 
of  a  given  type  can  occur  first  in  a  block) .  It  is  understood  that  a 
mechanized  process,  using  the  descriptions  of  the  file  and  record  struc¬ 
ture,  is  available  which  applies  these  rules  to  distribute  the  bit  string 
representations  of  the  records  appropriately.  The  characteristics  which 
specify  record  positioning  are: 

1.  Record  distribution  ratio.  The  number  of  records  stored  in 
basic  blocks  can  be  described  as  a  ratio  of  records  per  basic  blocks. 
Three  examples  of  ouch  ratios  arc  illustrated  below: 


{ 
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2.  Record  spilt  set.  The  record  split  set  is  the  set  of  those 
record  types  whcse  bi-  string  representations  may  be  split  between  basic 
blocks. 

When  the  space  remaining  in  u  basic  block  is  not  enough  to  contain 
the  complete  bit  string  of  a  record  tyre  which  may  not  be  split,  then 
that  bit  string  must  be  put  into  the  next  basic:  block.  The  remaining 
spuce  in  the  first  basic  block  is  considered  to  be  arbitrarily  filled. 
The  contents  of  such  unused  sjxtce  is  called  filler. 

j.  Ctart  record  set.  The  Sturt  record  set  is  the  set  of  those 
record  types  that  may  occur  first  in  a  basic  block.  Kor  example,  if  the 
bit  strings  representing  records  of  two  types,  X  and  Y,  are  to  be  posi¬ 
tioned  in  basic  blocks  such  that  only  records  of  type  X  may  occur  as. 
the  first  record  in  a  basic  block,  then  the  start  record  set  is  { X } . 


I 
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k,  Alignment  act.  The  alignment  sat  gives,  for  each  type  of 
record,  the  set  of  fields  and  groups  which  oust  he  aligned  with  respect 
to  storage  oell  boundaries,  whether  the  alignment  is  to  the  left  or  to 
the  right,  and  the  padding  characters. 

Hie  user  specifies  how  pointers  in  his  file  are  to  be  interpreted 
in  terns  of  the  following  characteristics; 

5 •  Pointer  type.  Pointers  may  give  the  main  memory  addresses 
or  device  addresses. 

6.  Pointer  mode.  Pointers  pay  be  interpreted  as  giving  the 
absolute  address  of  a  record,  or  the  address  relative  to  some  fixed 
origin  | for  example,  the  first  record  in  the  file  or  a  particular  block 
in  the  storage  structure) . 

7.  Addressing  scheme.  Hie  addresses  of  blocks  may  be  represented 
by  numbers  in  ascending  or  descending  order  beginning  with  a  particular 
number.  All  blocks  of  the  same  kind  may  be  zxurt ired  sequentially  or 

all  blocks  within  a  particular  substructure  may  be  numbered  sequentially. 
This  characteristic  must  be  specified  when  pointers  give  device  addresses. 

6.  Pointer  form,  tfken  the  pointers  give  device  addresses  the 
levels  of  block  addressing  used  to  form  each  pointer  must  be  specified. 
For  example,  a  pointer  *or  a  cMsk  may  consist  of  Just  a  cylinder  number 
or  of  a  cylinder  number  and  a  track  number. 

This  now  completes  the  characteristic;  which  must  be  specified  to 
encode  a  file  in  Ito  storage  structure. 
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5 . 3  An  Application  of  the  Model  of  Storage  Structures 

We  will  now  illustrate  how  the  model  is  used  to  encode  the  stor¬ 
age  structure  of  a  file  into  its  bit  string  representation.  We  will 
take  the  file  of  sequentially  ordered  person  records  given  in  Example 
1  of  Chapter  b,  and  position  these  records  relative  to  the  basic  blocks 
in  the  sssi  -  tape  file  X  of  the  above  example. 

Applying  the  following  encoding  characteristics  to  the  storage 
structure,  wo  obtain  the  bit  string  representation  illustrated  in 
Figure  5-3. 

1.  The  length  of  basic  block  A  is  200  bytes  (9  bit  columns). 

2.  The  length  is  uniform. 

3.  There  are  to  be  three  identical  labels,  in  this  case  tape- 
marks;  one  at  the  beginning  of  tape  file  X,  and  two  at  the 
end  of  tape  file  X. 

k.  Since  there  is  only  one  kind  of  block  in  tape  file  X,  no 
order  is  specified, 

5.  An  occurrence  of  basic  block  A  in  tape  file  X  is  mandatory. 

6.  There  may  be  any  number  of  occurrences  of  basic  block  A  in 
tape  file  X. 

7.  The  number  of  occurrences  of  basic  block  A  is  not  uniform 
among  occurrences  of  tape  file  X. 

The  record  positioning  rules  are: 

l.  The  bit  string  may  be  split  only  between  records. 

2,  The  distribution  ratio  is  -  1  record  :  1  block. 

3.  The  start  record  set  for  basic  block  A  is  {person}. 
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The  alignment  set  io  empty. 

Pointers  were  not  used  to  implement  the  file  structure  so  no 
pointer  characteristics  are  specified. 


hit  string  bit  string  bit  string  bit  string  bit  string 
encoding  encoding  encoding  encoding  encoding 


of  r. 


of  r, 


f  tape  block  1 
tape- 
mark 


filler 


tape- 

marks 


Figure  5*3-  Bit  String  Representation  of  Tape  File  X 


5.1*  The  Completeness  and  Generality  of  the  Model 

We  will  now  show  that  the  model  of  storage  structures  is  complete 
in  the  sense  that  it  incorporates  all  of  the  storage  level  characteris¬ 
tics  derived  in  Tfeble  2-1.  We  will  do  this  by  showing  that  the  struc¬ 


turing  characteristics  at  the  storage  level  of  Table  2-1  are  properly 
contained  in  the  conceptual  part  of  the  model  and  that  the  implementa¬ 
tion  characteristics  are  contained  in  the  encoding  rules  of  the  model. 
The  structuring  characteristics  for  devices  supported  by  software 
systems  allow  the  user  varying  degrees  of  freedom  in  specifying  storage 
structure  at  the  different  device  levels.  Hie  storage  structure  permits 
the  description  of  the  organisation  of  such  devices  at  every  level. 

The  implementation  characteristics  and  record  positioning  and  pointer 
interpretation  rules  are  specified  directly  by  the  encoding  characteris- 
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tics  of  the  model.  Hius,  the  model  is  complete  in  the  above  sense. 

The  storage  level  characteristics  of  Table  2-1  are  incorporated  in 
more  generalized  forms  in  the  model  to  allow  for  the  description  of 
variations  on  existing  data  structures.  This  generality  is  pro¬ 
vided  in  the  following  ways: 

1)  Bie  record  positioning  and  pointer  interpretation  miles  provide 
greater  user  control  than  is  provided  by  current  software  systems. 

2)  The  block  productions  permit  the  description  of  blocking  at 
every  device  level,  instead  of  adhering  tc  the  restrictions  imposed  by 
current  systems. 

3)  The  model  provides  a  more  generalized  way  to  specify  character¬ 
istics.  The  encoding  characteristics  for  otorage  description  as  well  as 
those  for  file  and  record  descriptions  can  be  specified  as  depending 

on  data  items,  other  characteristic i  and  functions  of  these.  Biis 
greatly  increases  the  variety  of  encodings  which  can  be  specified. 

4)  Hie  model  can  be  used  to  describe  storage  structures  on  any 
device  that  relies  on  the  basic  concepts  of  blocking  and  labelling. 

In  these  ways,  the  model  allows  variations  of  current  data 
structures  at  the  storage  level  to  be  described. 

5-5  The  Relationship  Between  the  Model  and  GDDL 

GDDL  has  been  explicitly  designed  in  terms  of  the  model.  The 
BBLOCK  and  BLOCK  statements  are  used  to  describe  the  conceptual  storage 
structure  of  the  model.  Each  encoding  characteristic  of  the  basic 
blocks  and  block  names  can  be  specified  by  one  or  more  parameters  in 
GDDL  statements.  Hie  paramo  tw  3  and  statements  for  these  characteria- 
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tics  are  listed  in  liable  5-2  given  below: 


Basic  Block  Encod¬ 
ing  Characteris¬ 
tics 

Statements  and 
Parameters 
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Section  in 
Appendix  A 

Length 
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Uniformity 

BBLOCK  Statement 
parameter  (ii) 
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parameter  (iii) 
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Record  Positioning 
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Statements  and 
Parameters 

m 

Section  in 
Appendix  A 

Start  record  set 

BBLOCK  Statement 
parameter  (viii) 

CO 

o 

3*2 

Alignment  set 

FIELD  Statement 
parameter  (vii) 

•H 

-p 

O) 

Si  ’f"! 

r* 

1.1 

pointer  Character¬ 
istics 

Statements  and 
Parameters 

cu 

h 

(0 

.0 

0 

Pointer  type 

POINTER  statement 
parameter  (ii) 

bO 

a 

•H 

ft 

p. 

A 

3-3 

Pointer  mode 

POINTER  statement 
parameter  (iii) 

^4 

3-3 

Addressing 

scheme 

BLOCK  statement 
parameter  (ii) 

3.2 

Pointer  form 

POINTER  statement 
parameter  (iv) 

> 

-J 

3-3 

Specification  of 
Characteristics 

Statements  and 
parameters 

Direct 

By  listed  param¬ 
eters 

Indirect 

Parameter  and 
PARAMPROG  State¬ 
ments 

1.1*. 4 

liable  5-2  The  Relationship  Between  the  Model 
and  GDDL 


The  way  in  which  BBLOCK  and  BLOCK  statements  ure  used  to  describe 
storage  structures  may  be  seen  by  comparing  the  i’onnat.  ol'  these  state¬ 
ments  (see  Section  3.1  and  3-2  of  Appendix  A)  with  the  definitions 
of  storage  item  and  storage  structure  (Definitions  5-1  and  5-3)- 
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The  BBLOCK  statement  has  the  format: 

BBLOCK  (block  name,  encoding  and  record  positioning 
characteristics) 

This  corresponds  to  the  definition  of  storage  item. 

The  BLOCK  statement  has  the  format: 

BLOCK  (block  name,  ...  ;  (list),  ...  ,  (list),  ...  ) 

This  corresponds  to  the  block  production  rules,  with  (list),  ...  ,  (list) 
corresponding  to  all  the  substructures  that  can  be  contained  in  the 
given  structure.  Each  list  may  refer  to  either  another  block  or  to  a 
basic  block. 

Since  GDDL  is  based  on  the  model,  GDDL  is  complete  and  general 
in  the  same  vay  as  the  model. 

5.6  Medium  Dependent  Encoding  Characteristics 

So  far  in  this  chapter  we  have  shown  via  the  model  that  GDDL  can 
describe  storage  structures  and  their  encodings.  Now  ve  must  consider 
how  such  storage  structures  are  tied  down  to  particular  media.  We  must 
show  how  the  storage  structure  is  related  to  the  various  physical  levels 
of  a  storage  medium. 

The  definition  of  basic  block  in  Section  S.2.1  requires  that  a 
basic  block  always  correspond  to  the  lowest  specifiable  level  of  a 
medium.  However,  a  block  may  correspond  to  one  of  several  physic*  ■> 
levele  of  a  medium.  In  a  disk  pack  for  example,  a  block  may  correspond 
to  a  track,  several  tracks,  a  cylinder,  several  cylinders,  or  a  pack. 
Therefore,  each  block  must  be  ti<sd  down  to  a  particular  physical  level. 
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This  is  achieved  by  specifying,  for  each  level  of  a  medium,  the  names 
of  the  blocks  which  correspond  to  that  level.  In  GDDL  this  specification 
is  called  a  device  statement  (see  Appendix  A,  Section  3.4).  A  device 
statement  also  specifies  other  medium  dependant  characteristics  such 
as  tape  density  for  tapes.  As  different  media  have  different  numbers  of 
levels  and  other  characteristics,  a  device  statement  is  needed  for  each 
storage  medium. 

For  example,  the  DISK  statement  required  for  the  above  disk 
example  is: 

DISK  (  ...  ,  pack;  'disk  file  Y ’ j  cylinder:  'cylinder'; 
track:  'track  A',  'track  B',  'track  C'  ) 

The  device  statements  then  supply  all  the  medium  dependent  informa¬ 
tion  that  is  necessary  to  physically  encode  files  onto  storage  media. 

5.7  Demonstrations  of  GDDL' a  Completeness 

In  the  previous  section  we  showed  that  GDDL  is  complete  for 
storage  description  by  showing  that  the  model  on  which  it  is  based  is 
complete.  We  now  provide  several  practical  examples  of  its  completeness. 
These  examples  demonstrate  the  use  of  GDDL  in  describing  real-world 
files.  The  storage  descriptions  are  part  of  larger  examples  of  complete 
conversions  of  data  from  or.c  structure  to  another.  Ihese  examples  are 
given  in  Appendix  B. 


CHAPTER  6  DAm  CONVERSION 


6 . 1  Introduction 

In  previous  chapters  we  have  seen  how  to  model  a  file,  which  is 
encoded  as  a  bit  string  on  a  storage  medium,  and  how  such  a  file  may  be 
explicitly  described.  Thus,  the  first  two  objectives  of  this  disserta¬ 
tion  have  been  achieved.  Now  we  will  show  how  such  descriptions  can 
be  used  when  data  in  one  structure  is  to  be  converted  into  another 
structure.  Hiis  is  our  third  and  final  objective. 

In  the  first  section  below,  the  conversion  process  is  discussed 
in  general  terms,  and  it  is  shown  that  additional  information  in  the 
form  of  an  association  list  is  required  to  describe  data  conversion. 
Section  6.3  gives  a  model  of  the  concept  of  an  association  list  which 
satisfies  the  requirements  discussed  in  the  previous  section.  Section 
6.4  illustrates  the  model  by  giving  some  applications  of  it  in  examples 
of  data  conversion.  Section  6.5  shows  how  GDDL's  ability  to  describe 
this  aspect  of  data  conversion  is  based  directly  on  the  model  of  the 
association  list.  Finally,  Section  6.6  ties  all  the  previous  work 
together  by  showing  how  and  where  each  pert  oi  the  necessary  descriptions 
is  used  during  the  conversion  of  data  from  one  structure  to  another. 

6.2  Hie  Concept  of  the  Association  List 

Hie  present  discussion  apparently  marks  the  first  time  that  any¬ 
one  has  considered  precisely  what  information  must  be  made  explicit  to 
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use  data  descriptions  for  data  conversion.  Therefore,  we  must  develop 
our  ideas  from  basic  concepts,  and  so  a  longer  intuitive  discussion  will 
be  needed.  This  intuitive  discussion  will  be  organized  as  follows. 

First  we  will  specify  exactly  what  data  conversion  is,  using  terns 
developed  in  } revious  chapters.  Then  we  will  take  a  rather  simplified 
view  of  data  conversion  to  introduce  our  basic  notion  -  that  of  an 
association  list.  Finally,  we  will  show  how  this  notion  of  an  association 
list  needs  to  be  made  more  elaborate  so  that  all  the  requirements  for 
specifying  data  conversion  are  met.  Ve  begin  then  by  providing  a 
definition  of  data  conversion. 

Definition  6-1.  Data  Conversion  is  a  process  which,  given  the  bit 
string  representation  of  a  file  (or  set  of  files)  on  a  storage 
medium,  produces  the  bit  string  representation  on  a  (different) 
storage  medium  of  a  file  (or  set  of  files)  whose  data  items  contain 
values  from  the  data  items  in  the  first  file  (or  set  of  files). 

Th?  structure  of  the  first  file(s)  is  different  in  general  from  the 
structure  of  the  second  file(s). 

At  this  point,  it  will  be  convenient  to  introduce  some  new  terminol¬ 
ogy. 

Given  a  set  of  files  X^,  ...  ,  whose  data  are  to  be  converted 
into  a  set  of  files  Y1,  ...  ,  Yffi;  ve  call  the  files  X^  ...  ,  Xn  the 
source  files  and  the  files  Y1,  ...  ,  the  targes  files. 

Note  that  from  the  above  definition,  the  value  of  a  target  data 
item  must  always  be  the  value  of  some  source  data  item.  As  the  data 


item  is  the  most  primitive  structure  In  the  model,  we  do  not  provide 
the  capability  for  decomposing  a  source  value  into  smaller  components 
and  using  these  components  for  target  values.  For  example,  if  "KMJONES 
is  a  source  value,  the  only  target  value  that  may  be  obtained  from  this 
value  is  also  "TOMJCNES"  ana  not  just  "TOM"  or  "JONES".  This  is  not 
a  serious  restriction  as  a  user  would  normally  form  a  separate  data 
item  for  each  type  of  value  to  be  processed  independently.  Thus,  .In  the 
ebove  case,  "TOM"  might  be  the  value  of  one  data  item  and  "JONES"  the 
value  of  another  data  item,  and  both  data  items  would  be  structured  as 
a  source  group. 

Note  also  that  from  the  above  definition,  data  conversion  is  not 
necessarily  a  revers'blo  process.  Uiat  is,  if  file  A  is  converted  into 
file  B,  it  does  not  follow  that  file  B  can  be  converted  back  to  file  A. 

A  simple  oounter-exa^nple  is  when  file  B  only  contains  a  subset  of  the 
values  of  data  items  in  file  A. 

We  will  now  develop  the  basic  idea  of  the  process  of  data  con¬ 
version.  Consider  the  case  of  a  user  who  wishes  to  convert  data  from 
a  jingle  source  file  ,o  a  single  target  file.  We  will  assume  that  the 
user  has  the  bit  string  representation  of  the  source  file  stored  on  a 
storage  medium,  and  that  he  has  descriptions  of  the  source  and  target 
files,  'fiie  object  is  to  form  the  bit  string  repx'esentation  of  the 
target  file. 
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This  object  can  be  achieved  in  essentially  the  following  wuy: 

1)  Use  the  description  of  the  source  file  to  break  down  the  bit 
string  representation  so  that  the  actual  data  items  (attribute-value  pairs) 
are  obtained. 

2)  Provide  a  specification  of  how  the  values  of  source  data  item 
attributes  are  to  be  combined  with  target  data  item  attributes  for  the 
target  file. 

3)  Form  the  target  data  items  according  to  this  specification. 

h)  Structure  and  encode  these  data  items  according  to  the  descrip¬ 
tion  of  the  target  file  to  obtain  the  bit  string  representation. 

We  see  that  the  only  additional  information  that  the  user  needs 
to  provide  is  a  list  (corresponding  to  item  2  above)  which  gives,  for  each 
target  attribute,  the  source  attribute  which  is  to  provide  its  value. 

We  will  call  such  a  list  an  association  list  because  it  associates  each 
target  attribute  with  a  source  attribute.  Thus,  an  association  list  is 
a  set  of  target  attribute- source  attribute  pairs. 

The  essence  of  this  conversion  process  is  illustrated  in  Figure  6-1. 
I'Tinphasic  ic  placed  here  on  the  role  of  the  association  list,  because  it 
j.s  this  concept  that  we  now  want  to  develop.  Wo  will  wait  until  Section 
6.6  to  give  a  detailed  treatment  of  how  the  various  parts  of  the  Gource 
and  target  descriptions  relate  to  the  conversion  process. 

With  this  concept  in  mind,  we  can  go  on  to  a  more  elaborate 
discussion  of  the  requirements  for  an  association  list  to  completely 
describe  associations  between  source  and  target  attributes.. 
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Figure  6-1.  Simplified  Conversion  Process 


The  above  view  of  an  association  list,  as  a  set  of  target  attri¬ 
bute-source  attribute  pairs,  is  adequate  only  if  all  the  data  items  of 
a  target  record  are  obtained  from  a  single  source  record.  However,  in 
general,  the  value  for  a  data  item  of  a  target  record  may  be  obtained 
from  any  data  item  in  any  record  in  the  source  file.  Daere  are  two  caseo 
to  consider.  First  ve  consider  when  the  data  items  of  a  given  target 
record  are  obtained  from  several  different  source  records.  In  this  case 
we  must  identify  which  source  record  contains  the  appropriate  value  of 


the  source  attribute. 
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Secondly,  consideration  must  be  given  to  repeating  groups  and 
fields  within  the  source  record.  It  may  be  necessary  to  identify  from 
which  occurrence  of  the  repeating  group  or  field  the  value  of  the  source 
attribute  is  to  be  taken,  if  it  is  known  that  the  Jth  occurrence  is  to 
provide  the  value,  this  can  be  specified  by  indexing  the  source  attribute 
appropriately  in  the  association  list.  Otherwise,  we  must  develop  a 
further  means  for  identifying  appropriate  occurrences. 

We  can  summarize  the  requirements  as  follows.  To  properly  iden¬ 
tify  the  source  attribute  for  a  target  attribute  -  source  attribute  pair, 
the  association  list  must  include  the  capability  of  identifying: 

1)  the  required  source  record,  and 

2)  the  required  occurrence  of  a  repeating  group  or  field  in  the 
source  record. 

We  will  first  consider  how  the  required  record  can  be  identified. 
The  problem  here  is  similar  to  she  one  in  Chapter  b  for  specifying  which 
records  are  connected  by  access  paths.  In  that  chapter  the  solution  was 
to  provide  criteria  which  determine  that  two  records  are  to  be  linked  if 
they  satisfy  the  criteria.  We  can  use  a  similar  solution  here.  We  can 
specify  criteria  which  select  a  record  to  supply  a  value  for  a  source 
attribute  if  that  recc'd  satisfies  the  criteria,  in  this  way  we  can  use 
the  power  of  the  criterion  production  system  (cps)  which  we  already  luive 
available,  and  avoid  constructing  eddltlonal  mechanisms. 

We  saw  in  Chapter  h  that  the  cps  is  a  general  system  for  producing 
an y  criteria  over  records  in  terms  of  values,  characteristics  and  paths. 
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However,  the  records  which  could  be  named  (see  record-modifier  in  cps) 
in  these  criteria  were  only  head  record,  tail  record,  and  arbitrary 
(variable)  records,  lb  use  this  criterion  production  system  for  present 
purposes,  we  shall  see  that  three  extra  productions  must  be  added  to  pro¬ 
vide  new  ways  of  naming  records  which  are  involved  in  criteria.  In  the 
following  discussion  we  will  determine  what  source  records  need  to  be 
named,  and  develop  a  way  to  name  them  appropriately.  It  will  turn  out 
that  there  are  occasions  when  the  names  we  develop  can  be  used  by  them¬ 
selves  to  identify  directly  appropriate  records  in  the  association  list. 

We  first  observe  that  the  data  in  the  source  file  i3  organized 
in  some  way  which  is  meaningful  to  the  user.  This  is  very  apparent  from 
the  models  of  record  and  file  description  in  previous  chapters.  Similar¬ 
ly,  the  target  file  will  be  another  meaningful  organization  of  this  same 
data.  Consider  the  implications  of  this  observation  for  the  criteria 
that  the  user  will  want  to  introduce  into  the  association  list  to  iden¬ 
tify  particular  records.  Let  us  assume  the  user  needs  to  write  criteria 

for  some  target  record  that  i6  to  contain  attributes  A_  and  A„  .  He 

2 

has  decided  that  the  source  attributes  which  are  to  provide  the  values 

will  be  A0  and  Ae  .  The  association  list  will  therefore  contain  the 
°1  S2 

pairs  <  A_  ,  A_  >  and  <  A_  ,  A_  >.  Let  ub  assume  further  that  the 
T1  5l  s2  b2 

record  containing  the  occurrence  of  A„  which  provides  the  value  for 

51 

A  has  been  selected.  Now  the  user  wants  to  specify  criteria  for 
T1 

selecting  the  record  containing  the  occurrence  of  A_  which  provides  the 

b2 

value  for  A_  .  Because  the  data  items  in  the  target  record  are  supposed 

*2 

to  be  related  to  each  other,  he  may  want  to  use  the  values  of  sane  duta 
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items  in  the  source  record,  which  provides  the  value  for  A-,  ,  in 

X1 

selecting  the  record  containing  the  occurrence  of  A_  . 

b2 

For  example,  assume  each  source  record  contains  an  attribute  Ag 

Di 

The  user  may  want  the  record  which  contains  the  occurrence  of  AQ  (pro- 
viding  the  value  for  A-,  )  to  have  the  same  value  for  A_  as  the  value 

S2  i 

for  A  in  the  Vecord  which  orovides  the  value  for  A^  .  This  is  illus- 
Si  "  X1 

trated  in  Figure  6-2. 


Source  Records  of  lype  X 


Hie  occurrence  of  A 

bl 

in  this  record  pro¬ 
vides  the  value  for  An 


in  this  record,  the  value 

of  is  equal  to  v. . 

'Ji 

Thus,  the  occurrence  of 

Af.  in  this  record  pro- 

"2 

vides  the  value  for  A_  • 


Figure  6-2.  An  Example  of  Source  Hecord  Selection  for 
the  Formation  of  Target  Hecords 
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Ihe  above  example  illustrates  the  case  when  the  target  record  contains 

two  attributes.  In  the  general  case  when  the  target  record  contains  n 

attributes,  A„,  ,  A  ,  ...  ,  A  ,  ...  ,  A  ,  the  selection  of  the  source 
X1  l2  Ii  Tn 

record  which  providec  the  value  for  A_  (2  s  i  s  n)  may  depend  on  values 

ri 

in  those  source  records  which  provide  the  values  for  A-,  ,  . . .  ,  A_ 

T1  l-l 

We  will  see  an  example  of  such  a  case  <n  Section  6.4. 

We  must,  therefore,  provide  the  criterion  production  system  (cps) 

with  the  capability  of  referring  to  a  record  that  has  already  provided  a 

value  for  some  target  attribute.  Let  us  introduce  the  terminology 

SOURCE  (A,p  to  mean  the  source  record  which  provides  the  value  for  A^. 

Now  we  can  say  the  selection  of  the  record  providing  the  value  for  A™ 

ii 

may  depend  in  general  on  the  values  of  data  items  in  SOURCE  (A  ) ,  ...  , 

l'l 

SOURCE  (A.,  ) .  Using  thi6  terminology  we  can  express  the  criterion  in 

l-l 

the  Example  of  Figure  6-2  for  selecting  the  source  record  containing  the 

occurrence  of  A  which  provides  the  value  for  A  as:  A  OF  X  =  AG 
S2  \  Si  bi 

OF  SOURCE  (A  ) . 

We  have  thus  determined  a  way  of  naming  a  source  record  relative 
to  the  target  attribute  for  which  it  contributes  the  value.  This  way  of 
naming  source  records  provides  a  method  for  obtaining  criteria  to  identify 
appropriate  source  records  in  the  association  list.  In  addition,  if  the 
record  which  contains  a  source  attribute  is  required  to  be  SOURCE  (A^) , 
for  some  A^,  we  can  use  this  name  itself  for  identifying  the  appropriate 
record.  This  is  the  case  in  the  association  list  provided  for  Example  1 
in  Section  6.4.  These  capabilities  satisfy  the  first  requirement  abo>*. 
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Wo  now  consider  the  second  requirement  when  the  value  Tor  u 
turget  attribute  ic  to  come  from  a  source  attribute  in  a  repeating  group 
or  field  (see  Section  3-3)*  The  nami.ig  problem  here  is  analogous  to 
the  above  case  when  a  value  for  a  target  attribute  comes  from  a  particu¬ 
lar  source  record.  That  is,  the  user  may  want  subsequent  values  for 
other  target  attributes  either  to  come  from  the  same  group  or  to  be 
determined  by  other  values  in  that  group,  or  to  be  determined  by  encoding 
characteristics  of  that  group  or  field.  Thus,  we  need  a  way  of  referring 
to  source  groups  or  fields  that  have  already  been  used  to  supply  values 
to  the  target  record.  We  can  name  them  by  appending  the  attribute  of 
the  group  or  field  in  question  to  our  previous  term  SOURCE  (A^) .  For 
example,  to  refer  to  a  repeating  group  of  type  X  which  was  the  source 
of  the  value  for  target  attribute  A^,  we  say  X-SCUKCE  (A^) .  j.f  we  wish 
to  refer  only  to  a  higher  level  group  of  type  Y  which  contains  X,  we 
say  Y-S0URC2  (A^) .  We  have  thus  determined  a  way  to  name  a  repeating 
source  group  or  field  relative  to  the  target  attribute  for  which  it 
contributes  the  value.  Again  this  name  may  either  be  used  as  part  of 
criteria  which  identify  the  required  occurrences  of  the  repeating 
group  or  field,  or  the  name  may  be  used  by  itself  when  it  happens  to 
mine  the  required  occurrence.  'Ihese  capabilities  sutisfy  the  second 
requ  i  -e merit  above . 

There  is  one  additional  situation  to  consider  that  is  relates  to 
the  two  requirements  above.  This  situation  occurs  when  a  target  attri¬ 
bute  happens  to  be  the  attribute  of  a  repeating  target  field  or  group. 

We  may  have  to  identify  the  particular  occurrence  of  a  target  attribute 


-  138  - 


A,j  when  It  appears  in  names  of  the  form  SOURCE  (A^,) .  To  do  this  we 
must  make  our  naming  method  a  little  more  elaborate.  If  the  number  of 
the  occurrence  is  known,  then  the  attribute  can  simply  be  indexed. 
However,  if  the  number  of  the  occurrence  is  not  known  we  must  make 
provision  for  a  criterion  wh*  1  identifies  A^.  Ihus,  to  refer  to  the 
source  of  the  value  for  a  target  attribute  A^  in  a  repeating  (target) 
group  or  field,  we  use  either  the  name  SOURCE  (A^  index)  or  the  rame 
SOURCE  (At,  criterion) . 

We  have  thus  seen  how  to  name  records,  and  thence  groups  and 
fields  which  may  be  used  in  defining  criteria  to  select  particular 
occurrences  of  source  attributes.  Enis  naming  method  together  with  the 
cps  of  Chapter  4  allows  criteria  to  be  specified  which  meet  the  two 
requirements  for  properly  identifying  source  attributes  appearing  in 
the  association  list. 

So  far  we  have  only  considered  conversion  of  a  single  source 
file  to  a  single  target  file.  In  the  general  case  of  converting 
several  source  files  to  Beveral  target  files,  the  association  list  must 
specify  for  each  target  attribute- source  attribute  pair,  the  file  to 
which  the  target  attribute  and  the  file  to  which  the  source  attribute 
belong . 

In  the  next  section  we  will  present  a  model  of  the  concept  of 
association  list  based  on  the  discussion  above.  We  will  then  give 
some  examples  of  the  application  of  this  model  in  data  conversion. 
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6. 3  A  Model  of  the  Association  List 

Definition  6-2.  An  association  list  is  a  set  of  six-tuples  of  the 
form: 

<  target  attribute,  target  file  name;  source  attribute,  source 
file  name;  record  identification,  attribute  repetition  identifi¬ 
cation  > 

where:  l)  target  attribute  is  the  attribute  in  the  target 
record  to  be  provided  with  a  value, 

2)  target  file  name  is  the  name  of  the  target  file  in 
which  the  target  attribute  occurs, 

3)  source  attribute  is  the  attribute  in  the  source  record 
which  is  to  provide  the  value  for  the  target  attribute, 

4)  source  file  name  is  the  name  of  the  source  file  in 
which  the  source  attribute  occurs, 

5)  record  identification  is  optional;  it  is  either  a 

name  for  a  record  or  a  criterion  which  can  be  expressed 
using  the  criterion  production  system  specified  below, 
and  which  is  used  to  identify  the  scan  ce  record  in 
which  the  source  attribute  occurs, 

6)  attribute  repetition  identification  is  optional;  it 
Is  either  a  name  for  a  group  or  field  or  a  criterion 
which  is  used,  when  the  source  attribute  occurs  in  u 
repeating  group  or  field,  to  identify  the  particular 
occur rerce  in  which  the  source  attribute  occurs. 

When  tne  particular  repetition  is  always  uniform  and 


mandatory,  the  source  attribute  is  simply  indexed; 
otherwise,  the  criterion  is  expressed  using  the 
criterion  production  system  specified  below. 


Criterion  Production  System: 

Hiis  system  contains  the  productions  of  the  system  in  Chapter  k, 

and,  in  addition,  the  productions; 

source -reference  -♦  SO'JRCE  (attribute-modifier) 

-  SOL'ICE  (attribute-modifier,  criterion) 

record-modifier  -*  source-reference 

attribute-modifier  -*  attribute  —  source-reference 

Biese  productions  allow  a  record,  group  or  field  to  be  named  as 

that  record,  group  or  field  in  the  source  file  which  provides  the 

values  for  the  given  attribute  in  the  target  file. 

Tne  following  convention  is  to  be  observed  in  specifying  an 

association  list;  When  a  source  attribute  Ag  repeats  and 

1)  when  a  target  attribute  Arj,  does  not  repeat,  then  spe  ifying 

a)  <  A^j  Ag;  ...  >  implies  that  one  target  record  is  to 

be  formed  for  each  value  of  A.  (i.e.,  if  there  are  n 

S 

such  values  of  Ac,  then  n  target  records  are  formed) ; 

b)  <  A_;  A  (i);  ...  >  implies  that  only  one  target  record 

JL  b 

is  to  be  formed  and  the  remaining  values  of  Ag  are  to  be 
discarded  (i.e.,  are  not  to  be  used  as  values  for  A^  in 
other  target  records); 

2)  when  a  target  attribute  A^  repeats  an  unlimited  number  of 
times,  then  specifying  <  A,^  Ag;  ...  >  implies  that  A^  will 
repeat  exactly  as  many  times  as  Ag  repeats; 
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3)  when  a  target  attribute  repeats  either  a  fixed  or  bounded 
number  of  times,  say  m,  then  specifying 

a)  <  Am;  Aa;  . . .  >  implies  that  whenever  the  number  of  A 

1  b  -L 

repetitions  is  less  than  the  number  of  A<,  repetitions, 
then  target  records  are  to  be  formed  such  that  each 
value  of  A^,  appears  in  some  target  record; 

b)  <  At(1);  Ag(l);  ...  > 

<  A„(m) ;  A  (m);  ...  >  implies  that  whenever  the  number 

X  o 

of  A^  repetitions  is  less  than  tne  number  of  An  repeti¬ 
tions,  then  only  one  target  record  is  to  be  formed  with 

the  ith  value  of  Ac  as  the  ith  value  of  A  and  the  remain' 
b  X 

ing  source  values  of  Ag  are  to  be  discarded. 

6.4  Applications  of  the  Model  of  che  Associp-cion  List 
Example  1.  Extraction  of  a  New  File  from  an  Existing  File 

Consider  a  source  file  FI  whose  records  are  described  in  the 
following  way: 

i)  The  structures  of  the  records  are  described  by  the  set  of 
productions  PI: 

record  structure  -*  structure  R1 

structure  HI  -*  <  person,  [substructure  U1R1}> 
substructure  KlKl  -  substructure  Rll,  substructure  HIM’ 
substructure  Hl!{2  -•  substructure  HIP,  substructure  RlUj 
substructure  H1R3  -*  substructure  Ulj,  substructure  I<l4 


substructure  Kll  “•  name 
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substructure  R12  -*  age 
substructure  R13  -♦  6ex 
substructure  Rl4  -*  null 

substructure  Rl4  -•  substructure  Rl4,  substructure  Rl4 
substructure  Rl4  -»  structure  Rl4 

structure  Rl4  -•  <  hook,  {substructure  R14R1]  > 
substructure  R14R1  -*  substructure  Rl4l,  substructure  R14R2 
substructure  R14R2  -•  substructure  Rl4°,  substructure  Rl43 
sub  struct’ '.re  Rl4l  -*  title 
substructure  Rl42  ->  pages 
substructure  R143  -  date 

ii)  The  encoding  of  the  records  is  specified  by  a  set  of 
characteristics  Cl  (the  exact  specification  of  these 
characteristics  is  not  required  for  the  purpose  of  this 
example) . 

Consider  a  target  file  F2  whose  records  are  described  in  the 
following  way: 

i)  The  structures  of  the  records  are  described  by  the  set  of 
productions  P2; 

record  structure  -  structure  R2 

structure  R2  -•  <  person,  {substructure  R2R1]  > 
substructure  R2R1  -•  substructure  R2 1,  substructure  ((2R2 
substructure  R2R2  -•  substructure  R22,  substructure  lv’3 
substructure  H21  -♦  name 


substructure  1(22  -*  age 


substructure  R23  -*  sex 


ii)  The  encoding  of  the  records  is  specified  by  a  set  of  character¬ 


istics,  C2  (omitted  here). 

Ob  convert  data  in  source  file  F.l  to  the  form  of  target  file  y2, 
the  following  association  list  ic  provided; 

<  name,  b'2 ;  name,  FI  > 

<  age,  F2;  age,  FI;  SOURCE  (name)  > 

<  sex,  F2;  sex,  FI;  SOURCE  (name)  > 

Given  a  record: 

rToNEs  " 

32 

M 

SCIENCE  I 

3& 

1%8 

SCIENCE  II 
501 
1963 

it  ic  converted  as  follows; 


JONES 

32 

M 

SCIENCE  I 


JONES 

32’ 

M 
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All  the  source  records  of  FI  are  converted  in  this  way. 

Example  2.  Reorganization  of  u  File 

Consider  the  same  source  file  FI,  described  in  Example  1,  and  a 
target  file  F3  whose  records  are  described  in  the  following  way: 

i)  the  structures  of  the  records  of  F3  are  described  by  the  set 
of  productions  P3: 
record  structure  -*  structure  R3 

structure  R3  -*  <  book,  [substructure  R3R1}  > 
substructure  R3R1  -*  substructure  R31,  substructure  R3R2 
substructure  R3R2  -*  substructure  R32,  substructure  R3R3 

substructure  R3R3  "*  substructure  R33,  substructure  R34 

substructure  R31  “•  title 

substructure  R32  -*  substructure  R32,  substructure  R32 

substructure  R32  -♦  author 

substructure  R33  -•  date 

substructure  R34  -*  pages 

ii)  the  encoding  of  the  records  is  specified  by  a  set  of  character 
istics  C3  (omitted  here). 

To  convert  data  in  source  file  FI  to  the  form  of  target  file  F3 
such  that  there  is  one  target  record  for  each  value  of  'title'  in  a 
source  record  and  each  target  record  contains  the  'author'  from  all 
source  records  containing  the  same  'title'  the  following  association 
list  is  needed: 

<  title,  F3;  title,  FI  > 

<  author,  F3;  name,  FI;  criterion  > 


<  date,  F3;  date,  FI;  SOURCE  (title),  book-SOURCE  (title)  > 

<  pages,  F3;  pages,  FI;  SCURCE  (title),  book-SCURCE  (title)  > 
where  criterion  is:  (title)  =  (title  book-SOURCE  (title)) 
This  criterion  can  be  stated  in  English  as  follows: 

A  source  record  must  be  chosen  such  that  the  value  of  the 
attribute  'title'  in  that  record  is  equal  to  the  value  of  the 
attribute  'title'  in  the  source  group  'book'  which  was  pre¬ 
viously  selected  to  provide  the  value  for  the  target 
attribute  'title*. 

Given  the  following  two  records  from  FI: 

JONES 

32 

M 

SCIENCE  I 
384 
1958 

SCIENCE  II 
501 
1963 


ROE 

38 

M 

SCIENCE  II 
501 
1963 


they  are  converted  ac  follows: 
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This  diagram  illustrates  the  formation  of  two  target  records.  Hie  pro¬ 
cess  continues  until  all  of  the  target  records  have  been  formed. 

We  will  nov  discuss  in  detail  the  formation  of  the  second  target 
record.  During  the  formation  of  the  first  target  record,  it  was  noted 
that  the  association  list  entry  for  the  attribute  title'  followed  con¬ 
vention  3)  a.  Hierefore,  the  source  record  containing  the  value 
SCIENCE  I  is  now  checked  to  see  if  it  contains  additional  titles  which 
can  be  used  to  form  additional  target  records  -  it  contains  the  'title* 
SCIENCE  II.  H lie  is  used  as  the  value  for  the  target  attribute 'title ' 
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in  a  new  target  record,  lb  find  value i  for  the  target  attribute  'auth¬ 
or'  all  records  are  checked  to  see  if  they  contain  a  value  of  'title' 
equal  to  the  value  of  'title'  obtained  for  the  target  attribute  'title' 
(in  this  case  SCIENCE  II).  The  two  records  shown  above  contain  such  a 
value.  Therefore,  they  ure  used  as  sources  for  the  values  of  the  target 
attribute  'author'.  In  this  way,  the  values  JONES  and  KOE  are  obtained. 
Finally,  the  values  for  'date'  and  'pages'  are  obtained  from  the  same 
group  'book'  in  the  same  record  which  was  the  source  of  the  value  SCI¬ 
ENCE  II.  In  this  way,  the  target  record  for  SCIENCE  II  is  formed. 

6.5  The  Relationship  Between  the  Model  and  GDDL 

The  model  of  an  association  list  defined  in  the  previous  section 
provides  a  means  for  explicitly  stating  how  target  data  items  are 
formed  from  source  data  items  during  conversion. 

GDDL ' s  ability  to  describe  data  conversion  has  been  defined  in 
terms  of  this  model  and  thus  provides  similar  capabilities. 

We  will  now  show  how  the  model  and  GDDL  are  related.  GDDL' s 
ASSOCIATE  statement  (see  Appendix  A,  Section  2.3.I.I)  is  an  exact 
image  of  the  association  list  six-tuples.  Target  and  source  file 
names  appear  as  part  of  the  target  and  source  names  (parameters  i)  mid 
ii)).  The  SOURCE  (attribute-modifier,  criterion)  naming  scheme  appears 
explicitly  us  CDDL's  SOURCE  statement  (sec  Appendix  A,  Section  5 1  -  3 • 1  *  3) - 
Thus,  we  conclude  that  GDDL  can  specify  uny  association  list 
that  can  be  defined  using  the  model. 
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6.6  Hie  Conversion  Process 

Hie  association  list  completes  the  Information  needed  to  describe 
explicitly  how  data  is  to  be  converted  from  one  organisation  to  another. 
In  this  section  we  will  see  how  and  where  each  component  of  the  descrip¬ 
tion  for  the  source  and  target  files  together  with  the  association  list 
is  used  during  the  conversion  process. 

In  Figure  6-1,  we  showed  that  the  conversion  process  consists  of 
essentially  three  parts.  First,  the  source  file  is  broken  down  into 
its  component  data  items  using  the  source  description,  the  target  data 
items  are  formed  using  values  obtained  from  source  data  items,  and 
lastly  the  target  data  items  are  structured  and  encoded  according  to 
the  target  description.  Figure  6-3,  which  is  a  detailed  treatment  of 
the  conversion  process,  essentially  reflects  these  same  three  stages  in 
the  instance  of  conversion  from  several  source  files  to  several  target 
files. 

Figure  6- 3(a)  shows  how  source  descriptions  are  used  to  read  the 
source  files  from  the  storage  media  and  break  the  bit  string  representa¬ 
tion  down  into  data  items,  and  how  the  association  list  controls  the 
process. 

Figure  6- 3(b)  shows  how  the  target  data  items  are  formed,  and 
Figure  6- 3(c)  shows  how  these  data  items  are  organized  into  a  target 
file  and  written  onto  the  storage  media. 

Figure  6-3  is  not  an  algorithm  for  converting  data.  It  only 
shows  the  order  in  which  description  components  are  used  for  extracting 
a  single  data  item  from  a  source  file,  and  for  converting  the  value  of 
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this  data  item  into  part  of  the  target  file.  In  conversion  proper, 
when  large  numbers  of  data  items  must  be  extracted,  much  of  the  pro¬ 
cessing  for  each  data  item  will  be  done  in  parallel  with  that  for  other 
data  items  for  efficiency  considerations. 

Let  us  follow  the  conversion  process  ucing  Figure  6-3. 

We  will  assume  that  the  process  is  underway  and  several  records 
for  a  particular  target  file  have  already  been  constructed.  Some  of 
the  data  items  for  the  next  target  record  have  already  been  formed 
and  we  will  now  follow  the  formation  of  the  next  data  item. 


The  target  record  structure  determines  the  attribute  for  this 
next  data  item.  We  must  now  begin  at  the  top  of  Figure  6- 3(a). 

The  association  list^T^  identifies  which  source  file  contains 
the  attribute  whose  value  will  be  combined  with  the  target  attribute. 

The  storage  structure  description  for  that  source  file  is 
used  to  determine  which  blocks  must  be  read  (i.e.,  which  blocks  contain 


records  of  the  file) . 


The  storage  encoding  characteristics  ^3^) are  needed  tc  read  these 


blocks  off  the  storage  medium  and  to  remove  any  labels. 


Once  the  bit  string  representation  of  the  file  is  obtained,  the 
association  list  (^4^  identifies  which  source  record  is  needed.  TVj 
locate  and  extract  the  bit  string  representation  of  the  record,  the 
criterLoa  used  for  sequencing  the  records  Q'3  )  and  the  file  encoding 
characteristics  are  used.  If  the  association  .1 1st  (  4^)  contains  a 
'•r iter  Ion  for  identifying  the  source  record,  many  records  may  trnve  to 


he  extracted  and  tested  against  this  criterion,  if  the  file  contains 
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Source  Data 


Item 


Target 
Data  item 


Figure  6-3,  b.  The  Formation  of  Target  Data  items 

From  Source  Data  Items 


target  Data  Item 
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access  paths  which  are  implemented  by  pointers,  the  access  path  criteria 
may  be  used  to  locate  the  source  record.  When  the  pointers  are 
stored  in  tables,  the  appropriate  file  eontainitig  the  table  must  be 
read  and  then  the  file  encoding  characteristics  must  be  used  in 
extracting  the  records  containing  the  pointers. 

Once  the  bit  string  for  the  source  record  has  been  located  and 
extract^-,  the  association  listf^T)  identifies  the  attribute  of  the 
data  item  in  the  record  which  is  to  be  extracted.  The  record  structure 
description l°r  the  record  gives  the  location  of  the  attribute 
relative  to  the  other  attributes  in  the  record.  The  attribute  encoding 
characteristics  and  the  alignment  set  characteristics  of  the  storage 
description  (^lo)  are  then  used  to  extract  the  data  item  from  the  x’ecord. 
The  alignment  set  characteristic  is  needed  to  recognize  fields  which 
are  preceded  by  pad  characters  aligning  them  with  respect  to  storage 
cell  boundaries. 

This  completes  the  process  of  extracting  the  desired  data  item 
from  the  source  file. 

The  process  of  forming  the  target  data  item  from  the  source  data 
item  just  extracted  is  illustrated  in  Figure  6~3(b). 

The  target  value  is  obtained  by  removing  attribute  markers  as 
specified  by  the  attribute  marker  characteristic  (^ll)  •  Then  the  value 
encoding  characteristics  of  the  source  and  target  descriptions  (l^) 
arc  compared  to  determine  an^  changes  that  must  be  made  in  the  value 
string  (e.g.,  adding  or  removing  pad  characters  to  create  the  correct 
length,  converting  from  one  character  code  to  another).  'Target  uttri- 


bute  markers  a3  specified  by  the  attribute  marker  characteristic  (13 
are  concatenated  onto  the  value  string  and  it  is  ready  to  be  used  as 
a  target  data  item  in  generating  a  target  record. 


We  will  now  follow  the  creation  of  a  target  file  to  explain  the 
proctssses  illustrated  in  Figure  6- 3(c). 

Bie  target  data  items  which  have  been  obtained  by  the  process 
above  are  orgenized  according  to  the  target  record  structure  descrip¬ 


tion 


© 


Then  the  structure  is  encoded  using  the  attribute  en¬ 
coding  characteristics  . 

As  the  target  records  are  generated  they  are  sequenced  according 
to  the  criterion  (16^)  which  determines  sequencing.  The  file  encoding 
characteristics  for  the  target  file  are  checked  to  see  if  tables 
of  pointers  or  embedded  pointers  are  to  be  created.  If  this  is  the 
case,  the  criteria  are  used  to  determine  which  records  are  to  be 
linked  by  the  pointers.  Uie  encoding  characteristics  (^9^  are  used  to 
create  the  pointers.  When  the  pointers  ara  to  be  Btored  in  tables, 
the  creation  of  the  tables  is  treated  as  the  creation  of  another  file. 

While  the  target  records  are  being  generated,  sequenced,  and  the 
required  access  paths  encoded,  the  Btorage  structure  description 
is  used  to  set  up  the  Btorage  format  on  the  devices  indicated.  Finally, 
the  record  positioning  and  pointer  interpretation  rules  (2^  are  used 
to  break  the  bit  string  representation  of  the  file  into  blocks  for  in¬ 
sertion  into  the  storage  structure.  At  this  time,  fields  are  aligned 
and  actual  addresses  for  the  media  in  question  are  used  to  implement 


CHAPTER  7  CONCLUDING  REMARKS 


We  conclude  this  report  by  assessing  the  contributions  and 
implications  of  this  work  for  the  data  processing  field. 

Die  work  on  data  description  presented  in  this  report  is  Just 
the  beginning  of  a  new  important  area.  In  many  ways,  the  development 
of  data  description  languages  is  analogous  to  the  dcve.'l  opment  of 
higher  level  programming  languages.  Programming  languages  were  developed 
to  make  it  easier  for  humans  to  prescribe  algorithms  to  machines  and  to 
communicate  algorithms  among  themselves.  Data  description  languages 
are  needed  to  make  it  easier  for  humans  to  describe  data  structures  to 
machines  and  to  communicate  data  structures  among  themselves.  Pro¬ 
gramming  languages  have  vastly  increased  the  power  and  applicability  of 
computers,  and  it  is  anticipated  that  ddl's  will  equally  stimulate  the 
data  processing  field. 

The  development  of  programming  languages  was  the  result  of  many 
studies  of  their  theoretical  and  practical  properties,  in  such  areas 
as  linguistics,  automata  theory,  compiler  design,  and  parsing  algorithms. 
It  is  expected  that  data  description  languages  will  require  an  equally 
intensive  study  before  their  power  and  applicability  are  thoroughly 
understood.  It  is  Important  that  this  study  proceed  on  abstract  and 
practical  planec  simultaneously,  so  that  each  may  stimulate  the  other 
and  neither  will  lose  sight  of  their  intended  goals. 
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This  re-port  has  contributed  to  the  study  of  data  description 

languages  in  both  these  planes .  First  we  have  developed  an  actual 

language  which  can  be  implemented  to  study  the  practical  problems  in 

* 

the  design  of  a  GDDL  processor  .  This  implementation  will  provide 
the  first  real  measures  of  the  efficiency  and  effectiveness  of  ddl's. 
Secondly,  we  have  contributed  on  a  more  formal  plane  by  providing  the 
first  complete  models  of  data  structure.  Before  a  theoretical  study  of 
data  description  languages  can  be  undertaken,  we  must  ensure  that  we 
sufficiently  understand  data  structures  to  know  whnt  must  be  formal!  ::d. 
Wiese  models  provide  a  first  step  toward  a  formal  study  of  ddl's. 

In  addition,  we  chose  to  examine  in  detail  one  major  application 
of  a  ddl  -  namely,  the  conversion  of  data  from  one  structure  to 
another.  We  found  that  information  about  the  relationship  between  the 
two  structures  must  be  provided  in  addition  to  the  description  of  the 
structures  themselves.  We  have  developed  a  method  for  stating  this 
information  based  on  the  concept  of  the  association  list.  This  enabled 
us  then  to  show  how  data  can  be  converted  using  ddl  descriptions  and 
where  the  parts  of  the  descriptions  are  used  in  the  process. 

We  will  now  make  come  suggestions  for  future  research  related  to 
data  description  languages. 

As  a  general  comment,  we  stress  the  importance  of  developing  com¬ 
plete  models  of  data  structures  other  than  the  one  presented  here. 

*  GDDL  is  being  implemented  at  the  Moore  School  of  Electrical  Engi¬ 
neering,  University  of  Pennsylvania  (Ha  1971). 


i 
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This  is  too  new  a  field  to  concentrate  totally  on  one  single  approach. 
There  is  much  to  be  gained  by  developing  and  implementing  ddl's  based 
on  radically  different  concepts. 

One  suggestion  for  research  is  based  on  the  following  observation. 
Traditionally,  computer  scientists  have  emphasized  a  dichotomy  in 
information  processing,  namely,  the  algorithm  which  processes  the  data, 
and  the  data  which  is  processed  by  the  algorithm.  This  emphasis  seems 
to  be  largely  due  to  the  disproportionate  effort  devoted  to  the  study 
of  algorithms  in  comparison  to  data  structures.  The  work  in  this 
report  suggests  a  more  fruitful  view  of  information  processing 
might  be  to  regard  it  as  a  trichotomy  -  namely,  the  algorithm,  the 
data  structure  for  that  algorithm,  and  the  access  method  by  which  the 
algorithm  obtains  data  from  that  data  structure. 


Figure  7-1 •  The  Trichotomy  of  Information 

Processing 


It  has  been  a  central  theme  of  this  report  that  data  structures 
are  understood  and  manipulated  best  by  separating  the  specification  of 
the  structures  from  the  specification  of  the  procedures  that  access 
data  in  the  structures.  However,  programming  language  theorists, 
particularly  those  in  extensible  languages,  tend  to  relegate  everything 
tfiat  is  not  "purely  algorithmic"  to  the  data  structure  including  the 
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uccecs  methods .  We,  therefore,  suggest  that  ace  coo  methods  should  be 
treated  as  a  separate  entity.  It  is  not  necessary  to  associate  an 
access  method  with  a  single  data  structure  or  vice  versa.  For  example, 
a  list  can  be  accessed  as  a  pushdown  or  a  queue.  There  are  many  models 
of  algorithms  which  have  been  studied,  now  we  have  introduced  a  model 
for  data  structures.  Future  research  should  be  devoted  to  studying 
how  algorithms  can  effectively  access  (i.e.,  store  and  retrieve)  data 
from  a  data  structure.  This  can  be  achieved  in  terms  of  a  model  for 
algorithms  and  a  model  for  data  structures.  The  specification  of  such 
access  methods  would  then  become  a  separate  component  of  a  programming 
language  or  a  part  of  the  languages  associated  with  Generalized  Data 
Base  Management  Systems. 

We  will  now  use  an  analogy  with  language  theory  to  suggest  a 
theoretical  framework  for  studying  data  structures  and  data  conversion. 

We  have  seen  how  a  ddl  description  specifies  the  structure  of  a 
file,  such  that  if  we  are  given  the  values  for  a  set  of  data  items,  we 
can  encode  these  data  items  in  accordance  with  the  structure  to  obtain 
a  bit  string  representation  of  the  file.  Let  us  define  a  "file 
language"  to  be  the  (Infinite)  set  of  bit  string  representations  of  files 
which  could  be  obtained,  depending  on  the  number  and  values  of  the  data 
items,  for  a  single  ddl  file  description.  We  can  regard  each  bit  string 
representation  of  a  file  as  a  sentence  in  a  "file  language".  Unis, 
a  ddl  description  can  be  considered  to  be  the  syntax  of  a  file  language. 
The  ddl  itself  Is  therefore  a  metalanguage  for  specifying  file  languages. 
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Data  conversion  then  is  an  exercise  in  syntax-directed,  translation, 
in  that  we  input  the  syntax  of  two  file  languages  specified  in  the  ddl 
metalanguage  and  a  sentence  in  one  of  the  rile  languages.  The  output 
of  the  conversion  is  then  a  sentence  in  the  other  file  language.  It 
would  be  interesting  to  study  file  languages  as  linguistic  entities. 
Language  theory  may  offer  insights  for  either  specifying  data  description 
languages  or  producing  a  new  model  of  data  structures  based  on  linguistic 
concepts. 

It  is  interesting  to  note  that  in  one  sense  data  conversion  is  a 
generalization  of  natural  or  programming  language  translation.  In  data 
conversion  we  need  the  association  list  to  relate  names  In  one  data 
structure  description  to  names  in  the  other  data  structure  description. 
The  corresponding  entity  in  natural  language  translation  is  a  dictionary 
which  rebates  words  in  one  language  to  words  in  the  other  language. 
Clearly  the  association  list  is  a  more  elaborate  notion  than  a  diction¬ 
ary  -  In  fact,  just  the  target  attribute- source  attribute  pairs  them¬ 
selves  correspond  to  such  a  dictionary.  The  reason  for  this  elaborate¬ 
ness  Is  that  in  language  translation  local  structures  are  preserved 
(e,g.,  clauses  remain  clauses,  phrases  remain  phrases,  and  sentences 
remain  sentences),  whereaB,  in  data  conversion  local  structures  need 
not  be  preserved  in  that  data  items  in  a  target  record  may  be  obtained 
from  many  source  records.  Ihe  notion  of  association  list  is  therefore 
a  generalization  of  a  language  to  language  dictionary.  This  might  have 
in  itself  some  linguistic  implications. 


Apart  from  the  general  ideas  outlined  above,  we  suggest  the 
following  research  possibilities  which  are  directly  related  to  the  model 
and  language  presented  herein. 

(1)  To  implement  a  GDDL  interpreter  and  data  convertor. 

(2)  To  formalize  the  model  more  completely  by  abstracting  the 
implementation  characteristics  at  each  level . 

(3)  To  obtain  a  mathematical  or  formal  characterization  of  the 
classes  of  data  structures  which  can  be  converted  from  one 
to  another  using  the  association  list  developed  in  Chapter  6. 

(4)  enhance  the  user  convenience  of  CDDL  by  incorporating  such 
features  as  extensibility  or  a  macro  capability  into  GDDL. 
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0.  Basic  Elements 


0.1  The  Character  Set 

The  Character  Set  is  composed,  of  digits,  alphabetic  characters, 
and  special  characters.  There  are  ten  digits: 

0,  1,  2,  3,  4,  5>  6,  7>  9  are  decimal  digits; 

0,  1  are  binary  digits. 

There  are  twenty- six  alphabetic  characters: 


A,  B,  C,  D,  E,  F,  G,  H,  I,  J,  K,  L,  M,  N,  0,  P,  (*,  R,  S,  T, 
U,  V,  W,  X,  Y,  Z 

There  are  twenty- nine  special  characters: 

Character  name  Character 

comma  , 

semicolon  ; 

colon  : 

period 

question  mark  ? 

exclamation  I 

apostrophe  ' 

quote  " 


open  parenthesis  ( 
close  parenthesis  ) 
space 

number  sign  # 
ampersand  & 
asterisk  * 


-  166  - 


-  1 69  - 


Character  name 

Character 

at  the  rate  of 

@ 

cents 

dollar  sign 

$ 

lozenge 

u. 

underline 

plus 

+ 

minus 

- 

slash 

/ 

equals 

= 

less  than 

< 

greater  than 

> 

percent 

$ 

logical  AND 

A 

logical  OR 

i 

logical  NOT 

— 1 

The  characters  in  the  character  set  are  the  most  primitive 
elements  of  GDDL.  Bie  characters  are  combined  to  form  strings  called 
names  and  strings  called  numbers. 

0.2  Numbers 

An  integer  is  a  string  of  digits 


I'bcamplee : 


0 

12b 

03791 
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A  signed  Integer  Is  the  character  "+"  or  the  character 

followed  by  an  integer. 

Examples:  +1 

-124 

+03791 

A  decimal  number  is  an  integer  or  a  signed  integer,  followed  by 

the  character  followed  by  an  integer. 

Examples;  9*012 

-102.0 

+1378*999 

A  number  is  an  integer,  a  signed  integer  or  a  decimal  number. 
0.3  Names 
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An  Indexed  user-defined  name  is  an  unindexed  user-defined  name 
followed  by  an  index. 

Examples:  'NAME' (10)  'FALSE' (13) 

'-102.0' (1) 

A  user-defined  name  is  either  an  indexed  user-defined  name  or 
an  unindexed  user-defined  name. 

Examples;  'GROUP'  'NAME' (10) 

A  system  name  is  any  of  the  names  listed  in  tndex  a. 

0.4  Statements 

Statements  in  GDDL  have  a  fixed  format  consisting  of  a  system 
name,  followed  by  a  sequence  of  parameters  enclosed  in  parentheses. 

Hie  statements  are  grouped  together  into  four  sections  according 
to  their  use; 

1.  Record  specification  statements, 

2.  File  specification  statements, 

3.  Storage  specification  statements,  and 

4.  Conversion  statements. 

Each  statement  in  GDDL  is  explained  in  a  separate  subsection, 
in  whicn  the  format  of  each  statement  is  presented,  followeu  by  a  descrip¬ 
tion  of  each  parameter.  A  discussion  of  the  usage  of  each  statement, 
together  with  examples,  is  then  given. 

Note:  in  presenting  statements  formats,  the  following  conventions  are 
used: 

(i)  square  brackets  indicate  optional  parameters. 

(LI)  "parameter  x,  ...,  parameter  x"  means  ttiat  pnrumeucr 
x  may  occur  in  the  statement  dxic  or  more  times. 
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(ill)  "[  parameter  x,  parameter  x  ]"  means  that  parameter 


x  may  occur  in  the  statement  zero  or  more  times. 


1.  Record  Specification  Statements 


The  following  terminology  will  be  used  in  presenting  the  Record 
Specification  statemencs: 

(i)  A  field  is  a  string  of  characters  or  binary  digits  represent¬ 
ing  a  data  item. 

(ii)  Two  fields  are  of  the  same  type  if  and  only  if  they  are 
referred  to  by  the  same  name  and  are  implemented  in  the  same  way. 

(iii)  A  group  is  an  organization  of  fields  and/or  other  groups. 

(iv)  Two  groups  are  of  the  same  type  if  and  only  if  they  have  the 
same  organization  and  are  implemented  in  the  same  way. 

(v)  A  record  is  a  group  which  is  to  be  used  as  a  basic  unit  of 
storage  and  retrieval. 

The  Record  Specification  statements  are  used  to  specify  the  organ¬ 
ization  and  implementation  of  records  in  terms  of  groups  and  fields. 
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1.1  FIELD  Statements 


format 


FIELD  (  field  name,  character  code,  length  type,  length, 
uniformity,  data  type  [  j  F,  alignment  factor,  align¬ 
ment  ]  [  ;  V,  alignment  ]  [  ;  criterion  ] 

[  ;  CONCODE  statement,  . . . ,  CONCODE  statement  ]  ) 

Note:  The  optional  CONCODE  statements  will  not  be  discussed 
in  this  section.  Their  format  and  usage  is  discussed 
in  Section  l.E.3. 


param¬ 

eters 


(i)  field  name  is  an  unindexed  user-defined  name. 

(ii)  character  code  is  either  the  system  name: 

B,  ASCII,  EBCDIC,  or  an  unindexed  user-defined  name, 

(iii)  length  type  is  either  the  system  name:  B  or  C. 

(iv)  length  is  either  the  string: 

n,  where  n  is  an  integer,  or  the  system  name  NOLIM. 

(v)  uniformity  is  either  the  system  name:  FIXED  (or 
simply  F)  or  VARIABLE  (or  simply  V) . 

(vi)  data  type  is  either  the  system  name: 

C,  or  the  string 

N  (base,  sign,  scale)  where: 

a)  base  is  the  string  n,  where  n  is  an  integer; 

b)  sign  is  either  the  system  name:  U,  Id),  NR  or 

the  string  R  (plus,  minus)  where  plus  and  minus  are 
CONSTANT  statements  (see  Section  l.!i.2) 

c)  scale  is  either  the  syi  .  name  FX  or  the  string: 

FL  (  E:  name,  M:  name  )  where  name  is  an  unindexed 


1 
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user-defined,  name,  and  either  the  string  E:  name 
or  M:  name  may  appear  first. 

(vii)  alignment  factor  is  the  string  n,  where  n  is  an  integer, 
(viii)  and  (ix)  alignment  is  the  parameter  string: 
orientation,  pad  factor 

where:  orientation  is  either  the  system  name  L  or  R; 
and  pad  factor  is  a  CONSTANT  statement  (see  Sec¬ 
tion  1.4.2) . 

(x)  criterion  is  an  unindexed  user-defined  name. 

Tie  FIELD  statement  is  used  to  specify  a  type  of  field. 
That  is,  it  specifies  the  name  and  implementation  for  fields 
in  terms  of  the  following  parameters: 

(1)  field  name.  This  parameter  gives  the  name  for  fields 
of  the  type  being  specified. 

(ii)  character  code.  This  parameter  names  the  character 

code  to  be  used  in  representing  fields.  The  parameter 
is  assigned  the  system  name: 

B,  when  the  fields  are  to  be  implemented  as  bit  strings, 
ASCII,  when  the  ASCII  character  code  is  to  be  used, 
EBCDIC,  when  the  EBCDIC  code  is  to  be  used,  and  a 
ucer-defincd  name,  when  a  user-defined  character  code 
is  to  be  used.  In  such  a  case,  the  name  must 
appear  as  the  first  parameter  in  a  CHAK  statement. 
Hie  use  of  this  option  will  be  discussed  in  Sec¬ 
tion  1,4.1. 


usage 

of 

the 
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(iii)  length  type.  The  length  of  a  field  type  is  the  number 
of  bits  or  characters  that  may  be  used  to  implement  a 
field  of  that  type.  This  parameter  is  used  to  specify 
whether  length  is  given  in  bits  or  in  characters.  The 
parameter  is  assigned  the  Byctem  name: 

B,  when  length  is  given  in  bits,  and 

C,  when  length  is  given  in  characters  of  the  code 
specified  by  parameter  (ii). 

(iv)  length.  Thib  parameter  is  used  to  specify  the  number 

of  characters  or  bits  needed  to  implement  a  field.  The 
parameter  is  assigned  the  string: 

n,  whan  not  more  than  n  characters  or  bits  are  needed, 
and  the  system  name,  and 

NOLiM,  when  there  is  no  limit  on  the  length  of  the 
fields. 

(v)  uniformity.  This  parameter  is  used  to  specify  whether 
all  fields  of  the  type  being  specified  are  to  have  the 
same  length.  The  parameter  is  assigned  the  system  name: 

FIXED  (or  F),  when  the  length  of  each  field  is  the 
same,  and 

VARIABLE  (or  V) ,  when  the  length  of  each  field  may  be 
different.  In  this  case,  if  an  integer  is  speci¬ 
fied  for  length  it  give-s  u  maximum  length  for  the 


fields . 
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(vi)  data  type.  This  parameter  is  used  to  specify  how  the 

fields  are  to  be  interpreted.  Hie  parameter  is  assigned 
the  system  name: 

C ,  when  the  fields  are  to  be  interpreted  as  character 
strings  in  the  character  code  specified  by  param¬ 
eter  (i),  and  the  string 

N  (  base,  sign,  mode  )  when  the  fields  are  to  be  in¬ 
terpreted  as  numbers. 

a)  base.  This  parameter  is  assigned  an  integer  which 
gives  the  base  of  the  number. 

b)  sign.  This  parameter  is  assigned  the  system  name- 

R,  when  negative  numbers  are  implemented  by  the 
radix  complement, 

ltD,  when  negative  number:-,  are  implemented  by 
the  diminished  radix  complement, 

S  (  plus,  minus  ),  the  CONSTANT  statements  are 
used  to  specify  which  characters  are  to  be 
the  ptus  and  minus  signs, 

N3,  when  no  sign  is  to  appear. 

c)  mode.  This  parameter  is  ussigned  the  system  name: 

FX,  when  the  number  is  a  fixed  point  number,  and 
FL  (  E:  name,  M:  name  )  when  the  number  is 

floating  point.  The  exponent  and  mantissa  of 
the  number  are  described  as  fielas.  the 
names  must  appear  as  t.ic  first  parameters  of 
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FIELD  statements.  When  the  string  S:  name 
appears  first  the  exponent  is  to  appear 
before  the  mantissa,  otherwise  the  mantissa 
appears  first. 

(vii)  and  (viii)  F;  alignment  factor,  alignment 

alignment  factor.  This  parameter  is  used  in  speeding 
the  alignment  of  fields  with  respect  to  arbitrary  bound¬ 
aries  (e.g.,  word,  half-word,  byte  boundaries).  The 
integer  specifies  the  number  of  bits  between  alignment 
boundaries. 

alignment.  This  parameter  specifies  the  actual  align¬ 
ment  of  the  field.  The  orientation  parameter  is 
assigned  the  system  name: 

L,  when  the  field  is  to  end  on  a  boundary,  and 

R,  when  the  field  is  to  begin  on  a  boundary. 

The  pad  factor  parameter  specifies  the  characters  that 
t  '.-e  to  fill  unused  storage  between  the  preceding  field 
and  the  field  being  aligned. 

(lx)  V:  alignment.  Hi  is  parameter  specifies  the  alignment 
jf  the  characters  or  bits  of  a  field  when  the  length  is 
longer  or  shorter  than  the  nutm  r  of  bits  or  characters 
reserved  for  it.  Hie  orientation  parameter  is  assigned 
the  system  name: 

L,  when  the  field  is  to  be  left  aligned  (truncation  or 


padding  occurs  at  the  right),  and 


\ 
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K,  when  the  i'ield  is  to  be  right  aligned  (truncation 
or  padding  occurs  at  the  left). 

The  pad  factor  parameter  specifies  the  characters  or  bits 
that  are  to  fill  the  unused  space. 

(x)  criterion.  This  optional  parameter  names  a  criterion 
which  is  to  be  applied  to  the  fields  of  the  type  being 
specified.  If  the  criterion  is  not  satisfied  the  field 
doer-;  not  occur  in  the  record.  The  criterion  name  must 
appear  as  the  first  parameter  in  a  CRITERION  statement. 

Consider  a  set  of  fields,  each  of  which  is  an  ASCII  char¬ 
acter  string  of  variable  length.  Assume  the  maximum  length  of 
each  field  is  to  be  30  characters.  The  fields  are  to  be 
referred  to  by  the  name  1 COLLEGE ’  and  are  to  be  used  as 
names  of  colleges.  Ihe  following  statement  specifies  the  type 
for  these  fields. 

FIELD  (  'COLLEGE1,  ASCII,  C,  30,  V,  C  ) 

where;  1  COLLEGE '  is  the  field  name. 

ASCII  is  the  character  code. 

C  specifies  that  length  is  given  in  ASCII  characters. 
30  specifies  that  length  of  fields  cannot  exceed  JO 
characters. 

V  specifies  that  lengths  may  vary  from  field  to  field. 
C  specifies  that  the  fields  ax-c  to  be  interpreted  us 
character  strings. 

The  following  character  strings  are  fields  of  the  type  sjjecj- 
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Example* 

2 


fled  In  the  above  statement: 

UNIVERSITY  OF  PENNSYLVANIA 

PURDUE 

YALE 

Consider  n  set  of  fields,  each  of  which  is  an  ASCII 
character  string  of  fixed  length.  Assume  the  length  of  each 
field  is  2  characters.  She  fields  are  to  be  referred  to  by 
the  name  'YEARS'  and  are  to  be  used  as  the  number  of  years 
spent  at  a  college. 

The  following  statement  specifies  £  field  for  these 
values: 

FIELD  (  'YEARS',  ASCII,  C,  2,  F,  C  ) 

The  following  character  strings  are  fields  o  f  the  type 
specified  in  the  above  statement: 

04 

02 


10 


GROUP  Statements 


-  ifil  - 


I  • 


format 


param¬ 

eters 


GRCXJP  (  group  name,  group  order;  (list),  ...  ,  (list) 

[  ;  CONCODE  statement,  ...  ,  CONCODE  statement  ]  ) 

Note:  The  optional  CONCODE  statements  will  not  be  discussed 
in  this  section.  Their  format  and  usage  are  discussed 
in  Section  1.4.3* 

(i)  group  name  is  an  -’nindexed  user-defined  name. 

(ii)  group  order  is  either  the  system  name:  NOORD  or  SPEC, 

(iii)  list  is  a  string  of  parameters  with  the  following  for¬ 
mat: 

name,  occurrence,  repetition  number,  repetition  uni¬ 
formity  [  ■  0,  order  ]  [  ;  V,  criterion  name  ] 

where: 

a)  name  is  an  unindexed  user- defined  name; 

b)  occurrence  is  either  the  system  name:  MANDATORY 
(or  simply  M) ,  or  OPTIONAL  (or  simply  0) ; 

c)  repeti+on  number  is  either  the  string: 

n,  where  n  is  an  integer,  or  the  system  name 
NGLIMj 

d)  repetition  uniformity  is  either  the  system  name: 
FIXED  (or  simply  F),  or  VARIABLE  (or  simply  V); 

e)  the  optional  parameter,  order,  Is  cither  the  system 
name:  ASCEND,  or  DESCND,  or  em  an indexed  user- 
defined  name; 

f)  the  optional  parameter,  criterion  name,  is  an  unin- 
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dexed  U6er-defined  name. 

Bie  GROUP  statement  specifies  a  type  o i  group.  That  is, 
it  specifies  the  structure  and  implementation  of  a  set  of 
fields  and/or  other  groups*,  in  terms  of  the  following  param¬ 
eters; 

(i)  group  name.  This  parameter  is  used  to  refer  to  the 
type  of  group  being  specified. 

(ii)  order.  Oils  parameter  gives  the  order  in  which 
fields  and/or  subordinate  groups  are  to  occur  in 
the  group  being  specified.  The  parameter  is 
assigned  the  system  name: 

NOORD,  when  the  fields  and/or  subordinate  groups 
may  occur  in  any  order;  and 
SPEC,  when  they  must  occur  in  the  order  in  which 
they  are  named  in  parameter  (ill). 

(iii)  (list),  ...,  (list).  Each  list  specifies  how  a 

field  or  subordinate  group  16  to  be  implemented  as 
part  of  the  group  being  specified, 
a)  name.  Ibis  parameter  names  the  type  of  a 
field  or  subordinate  group  which  is  to  be 
included  in  the  group.  The  name  must  appear 
as  the  first  parameter  in  a  FIr'u)  or  a  GROUP 
statement. 


Groups  included  in  other  groups  are  called  subordinate  groups. 
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b)  occv  rence.  3Ms  parameter  specifies  whether 
fields  or  subordinate  groups  of  the  type  named 
by  parameter  (a)  are  optional  or  mandatory. 

The  parameter  is  acsigned  the  system  name: 
MANDATORY  (or  M) ,  when  the  fields  or  groups 

of  the  type  named  must  occur  in  each  group, 
and 

OPTIONAL  (or  o) ,  when  the  fields  or  groups  are 
not  mandatory. 

c)  repetition  number.  This  parameter  specifies 
whether  the  number  of  fields  or  subordinate  groups 
of  the  type  named  by  parameter  (a)  are  to  occur 

in  each  group.  The  parameter  is  assigned  the 
string: 

n,  when  at  most  n  fields  or  subordinate  groups 
may  occur,  and 

NOLIM,  when  any  number  may  occur. 

d)  repetition  uniformity.  This  parameter  specifies 
whether  the  parameter,  repetition  number,  gives 
the  exact  or  the  maximum  number  of  fields  or 
subordinate  groups.  The  parameter  is  assigned 
the  system  name; 

FIXED  (or  F),  when  the  repetition  number  gives 
the  exact  number,  and 

VARIABLE  (or  V),  when  it  gives  the  maximum  number. 
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e)  order.  This  parameter  specifies  the  order 
for  fields  or  subordinate  groups  of  the  type 
named  by  parameter  (a)  when  more  than  one 

of  the  type  is  to  occur  (i.e.,  the  repetition 
number  parameter  is  NQLIM,  or  n,  greater 
than  l) .  The  parameter  is  assigned  the  system 
name: 

ASCEND,  when  parameter  (a)  names  a  field  type 
whose  fields  are  to  be  arranged  in  ascend¬ 
ing  order, 

DESCND,  wh'..*n  >arameter  (a)  names  a  field  type 
whose  fieldr  are  to  be  arranged  in  descend¬ 
ing  order;  and 

an  unindexed  user- defined  name,  when  parameter 
(a)  names  a  field  or  group  type  whose  fields 
or  groups  are  to  be  arranged  sequentially 
according  to  a  criterion  which  determines 
for  each  pair  of  fields  or  groups  which  is 
to  occur  first.  Ihe  name  must  appear  ac 
the  first  parameter  in  a  CRITERION  statement. 

f)  criterion  name.  This  parameter  names  a 
criterion  which  determines  whether  fields 
or  subordinate  groups  of  the  type  name  are 
to  occur.  The  n.*me  trust  appear  as  the  first 
parameter  in  a  CRI'ERICN  statement. 
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Consider  two  sets  of  fields  of  the  following  typos: 

(i)  the  set  of  fields  of  type  ' COLLEGE '  described  in 
Example  1  of  Section  1.1,  and 
(ii)  the  set  of  fieldr  of  type  'YEARS'  described  in 
Example  2  of  Section  1.1. 

These  fields  are  to  be  organized  into  groups  of  a  type  'COLINF' 
whore  one  field  of  type  'COLLEGE'  is  to  be  followed  by  one 
field  of  type  'YEARS'.  Hie  following  statements  specify  this 
group  type; 

FIELD  (  'COLLEGE',  ASCII,  C,  30,  V,  C  ) 

FIELD  (  'YEARS',  ASCII,  C,  2,  F,  C  ) 

GROUP  (  'COLINF',  SPEC; 

(  'COLLEGE',  M,  1,  F  ), 

(  'YEARS',  M,  1,  F  )  ) 

Hie  following  character  strings  are  groups  of  type 
'COLINF': 

(i)  PURDUE  03 

(il)  UNIVERSITY  CF  PENNSYLVANIA  Oh 

Consider  three  sets  of  fields  of  the  following  types; 

(i)  a  set  of  fields,  each  field  of  which  is  an  ASCII 

character  string  of  variable  length  of  ‘j  characters 
maximum.  TT.o  fields  are  to  be  referred  to  by  the 
name  'SURNAME'  and  are  to  tx.  used  us  person's  name. 
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(ii)  the  set  of  fields  of  type  'COLLEGE'  desc.-ibed 
In  Example  1  of  Section  1.1. 

(ill)  the  set  of  fields  of  type  'YEARS'  described  in 
Example  2  of  Section  1.2. 

Fields  of  these  types  are  to  be  organized  into  groups  of  type 
'PERSIA® '  where  one  field  of  type  'SURNAME'  is  to  be  followed 
by  zero  or  more  occurrences  of  fields  of  types  'COLLEGE'  and 
'YEARS'  organized  into  groups  of  type  'COLINF',  described  In 
Example  1  above.  Hie  following  statements  specify  this  group 
type: 

FIELD  (  'SURNAME',  ASCII,  C,  15,  V,  C  ) 

FIELD  (  'COLLEGE',  ASCII,  C,  30,  V,  C  ) 

FIELD  (  'YEARS',  ASCII,  C,  2,  F,  C  ) 

GROUP  (  'COLINF',  SPEC; 

(  'COLLEGE',  M,  1,  F  ), 

(  'YEARS',  M,  1,  F  )  ) 

GROUP  (  'FERSDATA' ,  SPEC; 

(  'SURNAME',  M,  1,  F  ), 

(  'COLINF',  0,  NOLIM,  V  )  ) 

The  following  character  strings  are  groups  of  type 
'FERSDATA': 

(i)  DANIELS 
(ii)  DANIELS  PURDUE  03 

(iii)  DANIELS  PURDUE  03  UNIVERSITY  QT  PENNSYLVANIA  01 


format 


RECCED  (  record  name,  group  name  ) 


param¬ 

eters 


usage 

of 

the 

state¬ 

ment 


Example 


(i)  record  name  is  an  unindexed  user-defined  name. 

(ii)  group  name  is  an  unindexed  user-defined  name. 

Tile  RECORD  statement  is  used  to  specify  that  a  type  of 
group  is  to  be  treated  as  a  record,  i.e.,  it  is  to  be  used  as 
a  basJ c  unit  of  storage  and  retrieval. 

(i)  record  name.  This  parameter  gives  the  name  for 
records  of  the  type  being  specified. 

(ii)  group  name.  This  parameter  is  the  name  of  the  type 
of  group  which  is  to  be  treated  as  a  record. 

Consider  groups  of  type  1  FERSDA.TR. '  described  in  Example 
2  of  Section  1.2. 

The  following  statement  indicates  that  these  groups  are 
to  be  treated  as  records  (called  ' FERDRCD ' ) : 

— - - - - ; 

RECORD  (  ' PER3RCD ' ,  ' FERSDATA '  ) 

The  complete  description  for  thic  type  of  record,  then, 
is  given  by  the  following  statements: 

FIELD  (  'SURNA<E',  ASCII,  C,  l1;,  V,  C  ) 

FIELD  (  'COLIEGE',  ASCII,  C,  jO,  V,  C  ) 

FIELD  (  '/EARS',  ASCII,  C,  2,  V,  C  ) 

GRCUP  (  'CQLINF',  SPEC; 

(  'COLLEGE',  M,  1,  F  ), 
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(  'YEARS',  M,  1,  F  )  ) 

GROUP  (  '  PERSDA.HA ',  SPEC; 

(  'SURNAME',  M,  1,  F  ), 

(  '  COLINF ' ,  0,  NOLIM,  V  )  ) 

RECORD  (  ' PERSRCD ' ,  'PERSDADA'  ) 

The  following  groups  are  records  of  the  type  specified 
by  the  above  GDDL  statements; 
i)  DANIELS 

ii)  DANIELS  PURDUE  03 
ill)  DANIELS  FURDUE  03  YALE  01 


1.4  Record  Specification  Substatements 


Hie  Record  Specification  subctatements  are  statements  which  are 
primarily  used  as  parameters  in  Record  Specification  statements  or  are 
referred  to  by  Record  Specification  statements. 

1.4.1  CHAR  Statements 

format  CHAR  (  character  code  name]  set  name,  set  name  ) 

param-  (i)  character  code  name  is  an  unindexed  user-defined  name, 

eters 

(ii)  and  (iii)  set  name  is  an  unindexed  user-defined  name. 

usage  The  CHAR  statement  is  used  to  specify  a  character  code 

of 

the  for  representing  values.  The  new  character  code  is  specified 

state¬ 
ment  in  terms  m  existing  one. 

usage  (i)  character  code  name.  This  parameter  gives  the  name 

of 

the  used  in  referring  to  the  character  code  being  specified, 

param¬ 
eters  (ii)  and  (iii)  set  name,  set  name.  These  parameters  specify 

the  character  code.  They  refer  to  SET  statements  which 

list  the  binary  code  for  each  character,  give  the  sort 

order  for  the  characters  and  relate  the  characters  to 

their  corresponding  codes  in  an  existing  character  code. 

This  latter  specification  is  necessary  during  conversion 

when  fields  are  converted  from  one  character  code  to 

another.  Each  name  must  appear  as  the  first  parameter 

in  a  SET  statement.  The  use  of  the  SET  statement  in 

specifying  a  character  code  ic  described  in  Section  IM. 
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1.4.2 


format 


param¬ 

eters 


usage 

of 

the 

state¬ 

ment 

and 

param¬ 

eters 


CONSTANT  Statements 

1 

CONSTANT  (  character  string,  character  code  ) 

(i)  character  ctring  is  any  string  of  characters  or  bits. 

If  the  string  contains  the  characters  (  or  ),  then  they 

must  be  surrounded  by  apostrophes. 

(ii)  character  coda  is  either  the  system  name  B,  ASCII, 

EBCDIC,  or  a  user-defined  name. 

Hie  CONSTANT  statement  is  used  as  a  parameter  when 
arbitrary  character  strings  are  required.  Hie  system  name 
CONSTANT  and  the  parentheses  serve  to  delimit  the  character 
string  for  the  DDL  processor.  For  this  reason,  if  parentheses 
must  occur  in  the  character  string,  then  they  must  be  surround¬ 
ed  by  apostrophes. 

The  character  code  must  be  assig  ed  the  system  name; 

B,  when  the  string  contains  only  the  characters  0  and  1 
and  these  are  to  be  'nterpeted  as  bits; 

ASCII,  when  tne  string  is  to  be  interpreted  as  a  string 
of  ASCII  characters; 

EBCDIC,  vhen  tie  string  is  to  be  interpreted  as  a  string 
of  EuCDIC  characters;  and 

a  uBer-defined  name,  whin  the  string  is  to  be  interpreted 
ao  a  ctring  of  characters  over  a  use  .’-defined  charac¬ 
ter  cod°!.  Hu:  nai.e  uusi  appear  as  the  first  param¬ 
eter  in  a  CHAR  statement. 


Example 


Consider  the  ASCII  string:  A(l).  If  this  string  is 
to  be  used  for  a  parameter,  it  must  be  entered  as; 


Note;  Any  blanks  other  than  the  first  blank  following  the 
open  parenthesis  will  be  considered  part  of  the 
character  string. 

1.4.3  CONCODE  Statements 


format  CONCODE  (  delimiter,  position  ) 

param-  (i)  delimiter  is  a  CONSTANT  statement  with  the  format 

eters 

described  in  Section  1.4.2. 

(ii)  position  is  either  the  system  name  PHX,  POX,  or  INX. 

usage  The  CONCODE  statement  is  used  as  a  parameter  in  other 

of 

the  GDDL  statements  to  specify  delimiters  in  the  form  of  character 

state¬ 
ment  strings.  These  delimiters  can  be  used  in  records  to  determine 

the  beginnings  or  ends  of  variable  length  fields,  to  separate 

repeating  groups  and  fields,  or  to  identify  optional  groups  or 

fields. 


usage  (i)  delimiter.  The  CONSTANT  statement  gives  the  character 

of 

the  string  which  is  to  be  used  as  the  delimiter, 

param¬ 
eters  (ii)  position.  This  parameter  is  used  to  specify  the 


CONSTANT  (  A'C 


ASCII  ) 


position  of  the  delimiter  relative  to  the  group  or 
field  being  delimited.  It  is  assigned  the  system  name 
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PRX,  when  the  delimiter  is  to  precede  (prefix)  the 
field  or  group, 

PIX,  when  the  delimiter  is  to  follow  (postfix)  the 
field  or  group,  and 

INX,  when  the  delimiter  is  to  be  inserted  between 
(infix)  repeating  fields  or  groups. 

Consider  the  field  type  'COLLEGE'  specified  in  Example  1 
of  Section  1.1: 

Each  field  of  that  type  is  an  ASCII  character  string  of 
variable  length.  Die  maximum  length  is  30  characters.  A 
delimiter  such  as  a  comma  followed  by  a  blank  can  be  used  to 
indicate  the  end  of  the  field.  The  following  statement  speci¬ 
fies  a  field  type  with  such  a  delimiter: 

FIELD  (  'COLLEGE-NAME',  ASCII,  C,  30,  V,  C; 

CCNCODE  (  CONSTANT  (  ,  ,  ASCII  ),  Pffi  )  ) 

■  - 

where  PTX  Indicated  that  the  characters,  and  a  blank  will 
follow  each  field  of  type  'COLLEGE-NAME' . 

The  following  character  strings  are  fields  of  the  type 
specified  in  the  above  GDDL  statement: 

1)  UNIVERSITY  CF  PENNSYLVANIA, 

ii)  PURDUE, 
iii)  YALE, 
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Example 

2 


Consider  two  field  types: 

i)  the  field  type  'COLLEGE-NAME'  specified  in  Example 
1. 

il)  the  field  type  'YEARS'  specified  in  Example  2  of 
Section  1.1: 

Each  field  of  that  type  is  an  ASCII  character  string  of 
a  fixed  length  of  2  characters. 

Fields  of  these  two  types  are  to  be  organized  into  groups 
of  a  type  'COLLINF'  such  that,  one  field  of  type  'COLLEGE-NAME' 
is  to  be  followed  by  one  field  of  type  'YEARS'.  Groups  of  type 
'COLLINF'  are  to  be  organized  as  repeating  groups  into  a  group 
of  type  'COLLDATA'  such  that  the  'COLL-lNF'  groups  are  separated 
from  each  other  by  a  semicolon  and  a  blank.  The  following  state¬ 
ments  specify  the  group  type  'COLLDATA': 

FIELD  (  'COLLEGE- NAME',  ASCII,  C,  30,  V,  C; 

CONCODE  (  CONSTANT  (  ,  ,  ASCII  ),  PTX  )  ) 

FIELD  (  'YEARS',  ASCII,  C,  2,  F,  C  ) 

GRtf’P  (  'COLLINF',  SPEC; 

(  '  COI  I JSGE-  NAME ' ,  M,  1,  F  ), 

(  'YEARS',  M,  1,  F  ); 

[  CONCODE  f  CONSTANT  (  ;  ,  ASCII  ),  I  NX  )| ) 

GROUP  (  'COLLDATA',  SPEC; 

(  'COLLDATA' ,  M,  NQLIM,  V  )  ) 
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Hie  following  character  strings  are  groups  of  type 
'CQLLDm': 

i)  PURDUE,  03 
11)  PURSUE,  03;  YALE,  01 

111)  UNIVERSITY  OF  PENNSYLVANIA,  04;  PURDUE,  03; 
YALE,  01 
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l.^i.h  Parameter  Statements 

Parameter  statements  are  statements  tin  *  are  used  as  parameters 
in  other  GDDL  statements.  They  may  be  used  to  specify  any  parameter. 
These  statements  are  used  when  the  user  wants  a  parameter  to  depend  on 
fields  and/or  other  characteristics  of  his  data  structure.  For  exam¬ 
ple,  a  user  may  want  to  have  the  length  of  a  field  equal  to  the  num¬ 
ber  of  times  another  field  repeats  in  a  record.  Parameter  statements 
allow  such  a  relationship  to  be  described. 

When  a  parameter  is  to  depend  on  a  single  field,  this  is  speci¬ 
fied  by  replacing  the  parameter  with  the  name  of  the  field.  Similarly, 
when  a  parameter  is  to  depend  on  a  single  other  characteristic,  this 
is  specified  by  replacing  th_  parameter  with  either  the  LENGTH,  COUNT, 
or  PARAMVAL  statements  described  in  the  following  sections.  When  a 
parameter  is  to  depend  on  a  function  of  fields  end  characteristics, 
the  PARAMPROG  statement  of  Section  l.h.i+.h  is  used. 

Since  more  than  one  field  or  group  of  a  particular  type  may 
appear  in  a  record  (by  repetition  or  by  inclusion  in  different  groups) , 
specific  occurrences  of  the  field  or  group  type  must  be  referred  to  in 
a  parameter  statement.  Such  referencing  is  done  by  modifying  the  field 
or  group  name  in  the  following  ways: 

th 

(i)  if  the  field  or  group  of  type  'X'  repeats,  the  n  value  of 

the  field,  or  nth  values  of  the  group,  is  referred  to  by  the  indexed 
name:  'X(n)  1  where  n  is  an  integer. 

(ii)  if  the  field  or  group  of  type  'X'  occurs  in  more  ttian  one 
group,  say  in  groups  ' T* ,  'U',  and  'V',  then  the  values  oi  the  Held 
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or  group  of  type  'X'  are  referred  to  by  the  name:  'X'  OF  'U'. 

As  many  of  the  phrases  "C!F  group  name"  may  be  specified  as  are  neces¬ 
sary  to  distinguish  between  different  values  of  the  field  or  group 
type.  For  example,  in  the  above  case,  if  the  group  of  type  'U'  occurs 
in  groups  of  types  ’S'  and  ' R ' ,  then  the  name: 

'X'  OF  'U'  OF  'S' 

refers  to  the  values  of  'X'  which  occur  in  'U'  which  occur  in  'S'. 

In  defining  different  organizations  of  fields,  it  will  occa¬ 
sionally  be  necessary  to  indicate  the  record  and  the  organization  of 
records  (files)  in  which  values  of  a  field  or  group  occur.  Such  refer¬ 
encing  is  done  by  modifying  the  field  or  group  name  in  the  following 
ways: 

(iii)  if  the  field  or  group  of  type  'X'  occurs  in  a  record 
1 RCD1 ’ ,  then  the  field  ox  group  is  referred  to  by  the  name: 

'X'  OF  'RCFl' 

If  the  field  or  group  occurs  in  one  or  more  groups,  the  group  in  Ques¬ 
tion,  cay  of  type  'U',  is  specified  before  the  record  name: 

'X'  OF  'U'  QF  ' KCD1 ' 

(iv)  if  the  record,  'RCD1',  containing  a  field  or  group,  'X',  is 
organized  in  a  file  named  'Z',  then  the  field  or  group  Is  referred  to 
by  the  name: 

’X’  OF  ' RCD1 1  OF  'Z' 

(v)  in  general,  whenever  u  structure,  'Gl',  occurs  as  part  of 
another  structure,  'S2',  then  the  structure  'Gl'  can  ulvayc  be  referred 


to  by  the  name: 
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'31'  OF  'S2' 

Names,  as  described  in  (i),  *ii),  (iii)>  (iv)  and  (v)  above, 
are  called  reference  names. 


1.4. 4.1 


format 


param¬ 

eters 


usage 

of 

the 

state¬ 

ment 


usage 

of 

the 

param¬ 

eters 


-  198  - 

LENGTH  Statements 
LENGTH  (  data  name,  length  type  ) 

(i)  data  name  is  a  reference  name. 

(ii)  length  type  is  either  the  system  name:  B,  ASCII, 
EBCDIC,  or  an  unindexed  user-defined  name. 

The  LENGTH  statement  may  be  used  as  a  parameter  when 
the  parameter  can  be  an  integer  (e.g.,  the  length  parameter 
in  the  FIELD  statement,  or  the  repetition  number  parameter 
in  the  GROUP  statement) .  The  LENGTH  statement  is  used  to 
assign  the  length  of  the  occurrence  of  a  structure  such  as  a 
field  or  group  to  a  parameter. 

(i)  data  name.  This  parameter  refers  to  a  structure  whose 
length  is  to  be  used  as  a  parameter. 

(ii)  length  type.  This  parameter  specifies  whether  the 

length  is  to  be  given  in  terms  of  bits  or  charactf  ,-s. 
The  parameter  is  assigned  the  system  name: 

B,  when  length  is  to  be  given  in  bits, 

ASCII,  or  EBCDIC  when  the  length  is  to  be  given  in 
ASCII  or  EBCDIC  characters,  and 
a  user-uefined  name,  when  the  length  is  to  be  given 
in  characters  t.f  a  user-defined  character  code.  The 
must  appear  as  the  first  parameter  in  a  CHAR 


statement. 
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Example 


1.4. 4. 2 


format 


param¬ 

eter 


usage 

of 

the 

state¬ 

ment 

and 

param¬ 

eter 


Example 


Consider  a  field  type  'C  LLEGE'  whose  fields  have  varia¬ 
ble  lengths  of  up  to  30  ASCII  characters  maximum.  Assuming 
that  fields  of  this  type  occur  only  once  in  a  record,  another 
field  type  (say  'X')  may  be  specified  for  the  record,  whose 
fields  are  to  have  the  came  number  of  bits  in  length  us  'COLLEGE' 
fields  have  in  characters,  Thau  is,  if  a  field  of  type  'COLLEGE' 
has  a  length  of  n  characters  in  a  record,  a  field  of  type  'X' 
in  that  record  has  a  length  of  n  bits. 

Hie  following  statement  specifies  such  a  field  type: 

FIELD  (  'X',  B,  3,  LENGTH  (  'COLLEGE',  ASCII  ),  V,  C  ) 
COUNT  Statements 

COUNT  (  data  name  ) 

data  name  is  a  reference  name 

The  COUNT  statement  may  be  used  as  a  parameter  when  the 
parameter  can  be  an  integer.  The  COUNT  statement  is  used 
when  the  parameter  is  to  be  assigned  the  number  of  times  a 
structure  such  as  that  of  a  field  type,  referred  to  by  the  data 
name,  occurs.  If  the  structure  does  not  occur,  the  parameter 
is  assigned  the  number  0. 

Consider  a  group  type  ’CQLIN1*"  which  may  occur  zero  or 
more  times  in  a  record.  The  user  may  define  a  field  type  (say 
'Y')  whoce  fields  are  to  be  interpreted  as  ASCII  character 
strings  with  their  lengths  in  characters  equal  to  tin:  number  of 
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occurrences  of  'CGLINF'. 

The  following  statement  specifies  this  field  type: 


FIELD  (  'Y',  ASCII,  C,  CCIJNT  (  'CGLINF'  ),  V,  C  ) 


Thus,  in  a  record  if  'CGLINF'  repeats  twice,  then  a 
field  of  type  'Y '  will  have  a  length  of  2  ASCII  characters. 
1.4.4. 3  PARAMVAL 


format  PARAMVAL  (  name,  parameter  number  ) 

param-  (i)  name  is  a  reference  name, 

eters 

(ii)  parameter  number  is  the  string  n,  where  n  is  an  integer. 

usage  The  PARAMVAL  states ent  may  be  used  as  a  parameter  when 

of 

the  the  parameter  is  to  depend  on  a  parameter  in  another  GDDL 

state¬ 
ment  statement . 


usage 

of 

the 

param¬ 

eters 


Example 


(i)  name.  This  parameter  identifies  the  t -ctement  whose 
parameter  is  to  be  used.  The  name  is  the  one  that 
appears  first  in  the  statement. 

(ii)  parameter  number.  This  parameter  identifies  which 

parameter  in  the  statement  identified  by  parameter  (i) 
is  to  be  used. 


Consider  t  field  type  'X'  described  as  follows: 
FIELD  (  'X',  ASCII,  C,  NGLJM,  V,  C  ) 


l.h  .k.h 


format 


param¬ 

eters 


'isa^e 

of 

the 

state¬ 

ment 


usage 

of 

the 

param¬ 

eters 
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If  the  lengths  of  fields  of  a  type  *Y'  are  to  equal  the 
lengths  of  fields  of  type  'X',  this  is  specified  as  follows: 
FIELD  (  'Y',  ASCII,  C,  PAPAMVAL  (  'X1,  4),  F,  C  ) 

The  LENGTH  statement  can  also  be  used  to  specify  this 
relationship.  In  general,  this  statement  is  meant  to  describe 
relationships  that  are  not  covered  by  the  previous  parameter 
statements. 

PARAMPROG  Statements 

PARAMPROG  (  program  name;  parameter,  ...,  parameter  ) 

(i)  program  name  is  an  unindexed  user-defined  name. 

(ii)  parameter  is  either: 
a  reference  name, 
a  parameter  statement,  or 
a  CONSTANT  statement. 

The  PARAMPROG  statement  may  be  used  as  a  parameter  when 
that  parameter  is  to  be  a  function  of  the  other  parameters, 
values  and/or  constants. 

(i)  program  name.  This  parameter  gives  the  name  of  a 

program  supplied  by  the  user  to  compute  the  function 
„ equired. 

(ii)  parameter,  ...,  parameter.  These  parameters  Identify 


the  fields,  parameters  and  constants  for  which  the 
function  is  computed. 


Consider  a  program,  ACCUM,  which  inputs  two  values,  A 
and  B  and  computes  the  value  of  A  +  B. 

Q5ie  following  statement  specifies  that  fields  of  type 
'Z'  have  lengths  equal  to  the  sum  of  a  field  of  type  'A' 
and  a  field  of  type  1 B ' : 

FIELD  (  'Z',  ASCII,  C,  PARAMPROG  (  'ACCUM',  'A',  'Li  ) 


2.  Pile  Specification  Statements 

Uie  following  terminology  will  be  used  in  presenting  the  File 
Specification  statements: 

(.1)  A  record  i6  a  group  which  is  to  be  used  as  a  basic  unit 
for  storage  and  retrieval, 

(ii)  Two  records  are  of  the  same  type  if  and  only  if  their 
organization  and  implementation  are  specified  by  the  same  RECORD  state¬ 
ment. 

(ill)  There  is  a  direct  access  path  from  a  record,  (cay,  A)  to 
another  record  (say,  B)  ii'  and  only  if 

a)  record  B  is  positioned  immediately  after  record  A; 

b)  record  A  contains  a  pointer  to  record  B;  or 

c)  in  a  table  of  pointers,  a  pointer  to  record  3  is 
positioned  immediately  after  a  pointer  to  record  A. 

Relative  to  this  access  path,  record  A  is  called  the  head 

record,  and  record  B  is  called  the  tail  record. 

(iv)  Consider  two  records  (say,  C  and  D) .  There  is  an  access 

path  of  length  n  from  C  to  D  if  and  only  if  there  are  n-1  records 

. , . ,R  1  such  that  there  are  direct  access  paths  from  C  to  U^, 

from  R  ..  to  B,  and  from  R,  to  R.  .  for  1  s  i  s  «-!• 
n-1  i  i+1 

(v)  Criteria  over  fields,  access  paths  and  characteristics 
of  data  are  used  to  determine  when  a  direct  access  path  1g  to  exist 
from  one  record  to  another. 

(vi)  Two  direct  access  paths  are  of  the  same  type  if  and  only 

if  they  satisfy  the  criteria  and  are  implemented  in  the  same  way. 
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(vii)  A  file  is  a  set  of  records  with  the  direct  access  paths 
between  them. 

The  File  Specification  statements  are  used  to  specify  a  file 
in  terms  of  criteria  for  determining  access  paths  and  the  implementation 
of  the  access  paths. 

The  statements  for  specifying  criteria  are  presented  in  Section 
2.1.  The  LINK  Section  2.2  is  used  to  specify  the  implementation  of 
direct  access  paths.  Hie  FILE  statement  of  Section  2.3  specifies  what 
different  types  of  access  paths  are  to  exist  for  a  particular  set  of 
records . 


( 
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2.1  Criteria  Statements 

Criteria  are  used  to  determine  when  a  direct  access  path  is  to 
exist.  Criteria  can  be  defined  over  fields,  characteristics  of  data 
(as  specified  by  the  parameters  of  GDDL  statements),  and  existing  access 
paths. 

In  describing  such  criteria,  it  may  be  necessary  to  indicate  that 
fields,  groups  or  characteristics  of  fields  or  groups  from  different 
records  of  the  same  type  are  being  compared.  To  refer  to  such  fields, 
groups  or  characteristics  unambiguously,  their  reference  names  are 
modified  in  the  following  way: 

liie  record  name  following  the  system  name  "OF"  in  the  reference 
name  is  replaced  by  the  OCC  statement  of  Section  2.12  .2. 

Reference  names  modified  in  this  way  are  also  called  reference 

names. 

When  a  criterion  is  defined  in  terms  of  records  other  than  the 
head  or  tail  record,  it  is  necessary  to  specify  whether; 

a)  the  criterion  is  satisfied,  if  it  is  satisfied  for  all  records 
other  than  the  head  and  tall  records,  or 

b)  the  criterion  ic  satisfied,  if  it  is  satisfied  for  at  least 
one  record  other  than  the  head  and  tail  records. 

To  specify  these  cases,  the  AIJ.OC  and  HOMFOCC  statements  of 
Sections  2.1/-  .  j  and  2.1. a. 4  are  used. 
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2.1.1  CRITERION  Statements 

format  CRITERION  (  criterion  name,  criterion  expression  ) 

param-  (i)  criterion  name  is  an  unindexed  user- 

eters 

defined  name. 

(ii)  criterion  expression  is  a  string  of  the  foru: 
a)  (  arithmetic  expression  )  relation  symbol 
(  arithmetic  expression  )  where: 

1)  relation  symbol  is  either  the  system  name: 

Eft,  or  =  , 

NQ, 

LT,  or  <  , 

LF,  or  s  , 

GT,  or  >  , 

GE 

2)  arithmetic  expression  is  a  string  of  the  form: 

i)  reference  name; 

ii)  Parameter  statement  (see  Section  1.4.4); 

iii)  CONS  BUTT  statement  (see  Section  1.4.2); 
and 

iv)  PAIH  (  record  reference;  record  reference; 
link  name  )  where: 

1)  record  reference  is  a  string  of  the 


form: 

record  name  [  ,  criterion  name  ] 
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where;  record  name  is  a  reference 
name,  and  criterion  name  is  an 
’inindexed  user-defined  name. 

2)  link  name  is  an  unindexed  user- 
defined  name. 

v)  (  arithmetic  expression  )  arithmetic 
operator  (  arithmetic  expression  ) 
where  arithmetic  operator  is  either 
the  string:  +,  -,  x,  or  /. 

b)  (  data  name  )  MEM  (  set  name) 

where:  l)  data  name  is  a  reference  name,  and 
2)  set  name  is  an  unindexed,  user- 
defined  name. 

c)  NOT  (  criterion  expression  form) 

where  criterion  expression  form  is  either  an 
unindexed  user-defined  name,  or  a  criterion 
expression. 

d)  {  criterion  expression  form  )  AND  (  criterion 
expression  form  ) 

where  criterion  expression  form  is  defined  as  in 
c)  above. 

e)  (  criterion  expression  form  )  OH  (  criterion 
expression  form  ) 

where  criterion  expression  form  is  defined  as  in 
c)  above. 
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f)  ALLOCC  statement.  Ihio  form  will  be  discussed  in 
Section  2.1.2. 3. 

g)  SOMEOCC  statement.  Biis  form  will  be  discussed  in 
Section  2. 1.2. 4. 

Criteria  on  fields,  characteristics  and  access  paths 
are  used  for  three  purposes: 

(1)  to  determine  when  records  are  to  be  linked  by 
access  paths; 

(2)  to  determine  when  particular  fields  and  groups 
which  are  optional  in  a  record  are  to  occur  (see 
the  FIELD  statement,  parameter  (x)  and  the  GROUP 
statement,  parameter  (iii)  f ) ;  and 

(3)  to  identify  records  for  data  conversion  (see 
Section  3) . 

A  criterion  is  specified  ir  terms  of  the  following 
parameters; 

(i)  criterion  name.  'Ihis  parameter  gives  the  name 
to  be  used  in  referring  to  the  criterion  being 
specified. 

(ii)  criterion  expression.  Ihis  parameter  gives  the 
criterion.  Each  form  this  parameter  may  take 
will  be  discussed  separately, 
a)  (  arithmetic  expression  )  relation  symbol 
(  arithmetic  expression  ) 


'r'(V)  - 


Ibis  form  is  used  to  specif;.-  tfcft  if  the  evalua¬ 
tion  of  an  arithmetic  expression  is: 

EQ,  or  =,  equal  to, 

NQ,  not  equal  to, 

LT,  or  <,  less  than, 

LE,  less  than  or  equal  to, 

GT,  or  >,  greater  than,  or 

GE,  greater  than  or  equal  to 

the  evaluation  of  a  second  arithmetic  expression, 
then  the  criterion  will  be  satisfied. 

An  arithmetic  expression  is  given  by 

i)  a  reference  name  when  the  evaluation  of 

the  expression  is  to  be  the  value  of  a  field, 
ii)  a  parameter  statement  when  the  evaluation  of 
the  expression  is  to  be  the  parameter  for  a 
particular  characteristic  which  had  been 
specified  as  depending  on  other  values,  ctr. 
or  as  varying. 

Lii)  a  CONG'JANT  statement  when  the  evaluation  is 
to  be  a  particular  coni: bant. 
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an  expression  of  the  form 
TAZH  (  record,  reference;  record  reference; 
link  name  ) 

when  the  evaluation  of  the  expression  is  to 
be  the  length  of  a  path.  If  the  path  speci¬ 
fied  by  the  expression  does  not  exist,  the 
evaluation  of  the  expression  is,  by  convention 
zero.  The  parameters  in  this  form  are  used  as 
follows : 

1)  record  reference.  This  parameter  specif ie 
the  record  at  which  the  path  is  to  begin. 
Record  name  gives  the  record  type  (it  may 
also  name  a  specific  record  of  the  type, 
see  Section  2. 1.2. 2).  Criterion  name- 
gives  the  name  of  a  criterion  used  to 
select  the  particular  record  desired. 

2)  record  reference.  This  second  parameter 
specifies  the  record  at  which  the  i»ath  is 
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to  end.  Oils  is  done  in  a  similar  way  to 
the  specification  in  parameter  l)  above. 

3)  Unit  name.  Erls  parameter  specifies  the 
type  of  link  which  determines  the  path  to 
be  tested.  Link  name  must  appear  as  the 
first  parameter  in  a  LINK  statement  (see 
Section  2.2) . 

v)  an  expression  of  the  form 

(  arithmetic  expression  )  arithmetic  operator 
{  arithmetic  expression  ) 
when  the  evaluation  of  the  expression  is  to 
be  the  evaluation  of  the  first  arithmetic 
expression  plus  (+),  minus  (-),  times  (x), 
or  divided  by  (/)  the  evaluation  of  the 
second  arithmetic  expression. 

b)  (  data  name  )  MEM  (  set  name  ) 

Ofcls  form  is  used  to  specify  that  the  value  of  a 
field  referred  to  by  the  data  name  is  a  member  of  the 
set  referred  to  by  set  name.  Set  name  must  appear  as 
the  first  parameter  in  a  SET  statement  (see  Section 
2. 1.2.1). 

c)  NOT  (  criterion  expression  form  ) 

Cil6  form  is  used  to  specify  that  the  negation 
of  a  criterion  expression  is  to  be  tested.  Ihe 
criterion  expression  may  be  given  directly  or  named. 
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If  the  criterion  expression  is  named,  the  name  must 
appear  as  the  first  parameter  in  another  CRIURION 
statement . 

d)  (  criterion  expression  form  )  AND  (  criterion 

expression  form  ) 

This  form  is  used  to  specify  that  the  logical 
conjunction  of  two  expressions  is  to  be  tested.  The 
criterion  expressions  may  be  given  directly  or  named. 

If  the  criterion  e:  pressions  are  named,  the  names 
must  appear  as  the  first  parameters  in  other  CRITERION 
statements . 

e)  (  criterion  expression  form  )  OR  (  criterion 

expression  form  ) 

This  form  is  used  to  specify  that  the  logical 
disjunction  of  two  expressions  is  to  be  tested.  Hie 
criterion  expressions  may  be  given  directly  or  named. 

If  the  criterion  expressions  are  named,  the  names 
must  appear  as  the  first  parameters  in  other  CRITERION 
statements. 

Example  Consider  an  optional  field  'DRAFT  GTMUS'  in  a  record  of 

1 

type  ' FEKSNL ' .  A  user  may  test  to  oee  if  this  field  occurs  by 
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Consider  a  record  of  type  'PERSNL'  containing  three  fields: 
'YEARS',  ; STARTDATA. ' ,  and  'ENDDATE',  each  of  which  is  a  numeric 
field.  A  user  may  test  to  see  if  the  value  of  'YEARS'  equals 
the  difference  between  the  values  of  'ENDDATE'  and  ' STARTDATE' 
by  specifying  the  following  criterion  (called  'TEST2'): 


CRITERION  (  'TEST2',  (  'YEARS'  OF  'PERSNL'  )  BQ 

(  (  'ENDDATE'  OF  'PERSNL'  )  -  (  ' STARTDATE '  QF 
'PERSNL'  ))) _ 

Consider  the  criterion  described  in  Example  2, 
above.  Bie  criterion  which  is  the  negation  of  this 
can  be  specified  in  two  ways  (called  'TEST3*!'  and  "n2ST3.2'). 
CRITERION  (  'TEST3.11,  NOT  (  ' lESTg '  )) 

CRITERION  (  'TEST3.2',  NOT  (  (  'YEARS'  QF  'FERSNL'T 
Eft  (  (  'ENDDATE'  QF  'PERSNL'  )  - 
(  'STARTDATE'  OF  'PERSNL'  )))) 
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2.1.2  Criterion  Sub statements 

The  Criterion  substatements  are  statements  which  either  are  used 
as  parameters  in  Criterion  statements  or  are  referred  to  primarily  by 
Criterion  statements. 

2. 1.2.1  SET  Statements 

.  I  - -  ..  .  _  -  -  i  -  - 

format  SET  (  set  name;  member,  ...,  member  ) 

param-  (i)  set  name  is  a  user-defined  name, 

eters 

(ii)  member  is  a  CONSTANT  statement. 

usage  The  SET  statement  i6  used  to  specify  a  set  of  strings 

of 

the  over  some  character  code.  This  set  1b  specified  by  listing 

state¬ 
ment  each  member  and  assigning  a  name  to  the  collection.  SET 

statements  are  used  together  with  CRITERION  statements  (see 

Section  2.1.1)  to  specify  set-theoretic  criterion  expressions. 

SET  statements  may  also  be  used  to  define  character 
codes  (see  Section  1.4.1).  In  this  case  two  SET  statements 
are  required.  The  first  SET  statement  is  used  to  specify  the 
individual  character  codes  as  bit  strings.  These  are  listed 
In  their  sort  order.  The  second  SET  statement  is  used  to 
relate  these  bit  strings  to  their  equivalents  in  an  existing 
code.  This  is  done  by  listing  the  bit  strings  of  the  existing 
code  in  the  same  order  as  their  equivalents  in  the  new  code 
were  listed.  If  there  art  fewer  bit  strings  in  the  existing 
code,  then  the  string  ***  is  used  in  place  of  the  lacking 
bit  string. 
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(i)  set  name.  This  parameter  is  the  name  of  the  set 
being  specified. 

(ii)  member.  A  CONSTANT  statement  is  used  to  specify 
each  member  of  the  set  being  defined. 

Consider  a  set  of  character  strings:  BURNS,  JACOBS, 
MILLER,  and  SANDERSON.  These  are  to  be  interpreted  as  authors' 
names.  The  following  statement  assigns  the  name  'AUTHORS' 
to  this  set. 

SET  (  'AUTHOR';  CONSTANT  (  BURNS,  ASCII  ), 

CONSTANT  (  JACOBS,  ASCII  ),  CONSTANT 
(  MILLER,  ASCII  ),  CONSTANT  (  SANDERSON,  ASCII  )  ) 

The  following  criterion  (called  'TESTSET') 
tests  whether  the  value  of  a  field  of  type  'AUTH'  in  a  record 
of  type  'BOOK'  is  a  member  of  this  set: 

CRITERION  (  'TESTSET',  (  ’AUTH'  OF  'BOCK'  ) 

MEM  (  'AUTHOR'  )  ) 
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OCC  Statements 

OCC  (  record  name,  occurrence  name  ) 

(i)  record  name  Is  an  unindexed  lser-defined  name. 

(ii)  occurrence  name  is  either  the  system  name: 

T,  or 
H,  or 

record  variable  name,  where  this  name  has  the  form 
Xn,  where  n  is  an  integer. 

The  OCC  statement  is  used  in  a  reference  name  to  iden¬ 
tify  a  particular  record  of  a  given  type  in  terms  of  the 
following  parameters: 

(i)  record  name.  This  parameter  names  the  type  of  record 
being  referred  to. 

(ii)  occurrence  name.  This  parameter  specifies  the 

particular  record,  that  is  being  referred  to.  The 
parameter  is  assigned  the  system  name: 

T,  if  the  record  is  the  tail  record  of  the  direct 
access  path. 

H,  if  the  record  is  the  head  record  of  the  direct 
access  path,  and 

Xn,  if  the  record  is  some  record  other  than  the  head 
or  tail  records.  For  each  such  distinct  record 
the  integer,  n,  must  be  different. 


Example 

1 


i 


-  217  - 

Conr-ider  the  case  where  a  type  of  direct  access  path  is 
being  defined  between  records  jf  a  particular  type,  say 
'PERSRCD',  which  is  described  in  tne  Example  of  Section 
1.1.3*  If  the  criterion  (say  'CRITEXl')  determining  the  access 
paths  is  defined  in  terms  of  the  field  'SURNAME'  in  'PERSRCD', 
such  that  the  value  of  'SURNAME*  in  the  head  record  is  less 
than  or  equal  to  the  value  of  'SURNAME'  in  the  tail  record, 
then  the  following  statements  are  used  to  specify  each  of  these 
particular  records: 

OCC  (  1 PEKSRCD ' ,  T  ) 

OCC  (  ' PERSRCD ' ,  H  ) 

These  statements  are  used  in  referenc  names  to  refer  to  the 
value  of  the  field  'SURNAME'  in  each  'PERSRCD*  record  as 
follows : 

''SURNAME'  OF  OCC  (  'PERSRCD ' ,  T  ) 

'SURNAME'  OF  OCC  (  'PERSRCD',  H  ) 

These  reference  names  would  be  used  in  a  criterion  expression, 
to  specify  the  criterion  'CRITEXl',  as  follows: 

CRITERION  (  'CRITEXl' , 

(  'SURNAME*  CF  OCC  (  'PERSRCD'  H  )  ) 

LE  (  'SURNAME'  OF  OCC  (  'PERSRCD' ,  T  )  ) 
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Consider  the  case  where  a  type  of  direct  access  path 
is  being  defined  between  records  of  the  type  'PERSRCD'  as  in 
Example  1,  above.  In  this  case  the  criterion  determining  the 
access  paths  is  defined  In  terms  of  'SURNAME'  fields  such 
that  two  records  are  linked  if  and  only  if  the  values  of 
'SURNAME'  in  the  head  and  tail  records  are  related  as  in 
Example  1,  above,  and  in  addition  there  is  no  other  'PERSRCD' 
record  in  which  the  value  of  'SURNAME'  is  less  than  the  value 
of  'SURNAME'  in  the  tail  record  and  greater  than  the  value  of 
'SURNAME'  in  the  head  record.  To  describe  this  by  a  criterion 
expression,  reference  must  be  made  to  'PERSRCD'  records  other 
than  the  head  and  tail  records.  The  following  statement 
specifies  this: 

OCC  (  'PERSRCD',  XI  ) 

This  statement  would  be  used  in  a  reference  name  to  refer  to 
the  value  of  the  field  'SURNAME'  in  the  other  'PERSRCD'  records 
as  follows: 

'SURNAME'  CP  OCC  (  'PERSRCD',  XI  ) 

This  reference  name  would  be  used  in  a  criterion  expression 
to  specify  the  pert  of  the  criterion  relating  to  the  other 
'PERSRCD'  records  as  follows: 

NOT  (  (  (  'SURNAME'  CF  OCC  (  'PERSRCD',  XI  )  ) 

LT  (  'SURNAME'  CF  OCC  (  'PERSRCD',  T  )  )  ) 

AND  (  (  'SURNAME'  CF  OCC  (  'PERSRCD',  H  )  ) 

LT  (  'SURNAME'  OF  OCC  (  'PERSRCD',  XI  )  )  )  ) 


Note;  To  completely  specify  this  part  of  the  criterion 
the  criterion  expression  above  would  appear  as  a 
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parameter  in  an  ALLOCC  statement.  This  is  demonstrated 
in  the  example  given  in  Section  2.1.2. 3  below. 

ALLOCC  Statements 

ALLOCC  (  record  variable  name,  ....  record  variable  name; 

criterion  expression  ) 

(i)  record  variable  name  is  a  string  of  the  form 
Xn,  where  n  is  an  integer. 

(ii)  criterion  expression  in  this  statement  is  the  same 
as  that  defined  for  parameter  (ii)  of  the  CRITERION 
statement. 

The  ALLOCC  statement  is  used  as  a  criterion  expression 
in  a  CRITERION  statement.  An  ALLOCC  statement  indicates  that 
the  criterion  expression  in  the  statement  is  satisfied  when 
it  is  true  for  all  records  identified  in  terms  of  the  record 
variable  names  given  by  the  first  parameters  of  the  statement. 

(i^  record  variable  name,  ...  record  variable  name. 

Hiese  parameters  specify  that  all  records  (other  than 
the  head  and  tail  records)  of  the  type,  identified  via 
the  record  variable  name  in  the  corresponding  0C0 
statement;,  must  be  tested  to  determine  if  the  criterion 
exi>ression  in  which  they  appear  is  satisfied. 
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(il)  criterion  expression.  Diis  parameter  specifies  the 
criterion  expression.  It  must  contain  at  least  one 
OCC  statement  having  a  second  parameter  of  the  form 
Xn  for  each  record  variable  name  of  the  form  Xn  in 
parameter  (i). 

lb  completely  specify  the  criterion  described  in  Example 
2  of  Section  2. 1.2. 2,  it  is  first  necessary  to  state  that  all 
records  of  type  'PERSRCD'  must  be  tested  to  determine  if  the 
criterion  expression  is  satisfied.  This  is  specified  by  the 
following  statement: 

CRITERION  (  'CRHEX2',  ALLOCC  (  XI; 

NOT  (  (  (  'SURNAME'  CP  OCC  (  'PERSRCD',  XI  )  ) 

LT  (  'SURNAME'  CP  OCC  (  'PERSRCD',  T  )  ) 

AND  (  (  'SURNAME*  CP  00C  (  'PERSRCD',  H  )  ) 

LT  (  'SURNAME'  CP  OCC  (  'PERSRCD',  XI  )  )  )  )  )  ) 

Bien,  this  criterion  must  be  combined  with  the  criterion 
'CRIlEXl'  to  form  the  conjunction: 

CRITERION  (  'CRITEX3',  (  CRIlEXl '  )  AND  (  'CRI1EX2'  )  ) 
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SCMEOCC  (  record  variable  name,  record  variable  name; 

criterion  expression  ) 

(i)  record  variable  name  is  a  string  of  the  form  Xn,  where 
n  is  an  integer. 

(ii)  criterion  expression  in  this  statement  is  the  same  as 
that  defined  for  parameter  (ii)  of  the  CRITERION  state¬ 
ment. 

The  SCMEOCC  statement  is  used  as  a  criterion  expression 
in  a  CRITERION  statement.  A  SCMEOCC  statement  indicates  that 
the  Criterion  expression  is  satisfied  when  it  is  true  for  at 
least  one  record  identified  in  terms  of  the  record  variable 
names  given  by  the  first  parameters  of  the  statement. 

(i)  record  variable  name,  ...,  record  variable  name. 

These  parameters  specify  that  at  least  one  record 
of  the  type  (other  than  the  head  and  tail  records) 
identified,  via  the  record  variable  name  in  the 
corresponding  OCC  statement,  must  satisfy  the  criterion 
expression  in  which  it  Is  named,  if  the  criterion 
expression  is  to  be  satisfied. 

(ii)  criterion  expression.  This  parameter  specifies  the 
criterion  expression.  It  must  contain  at  least  one 
OCC  statement  having  a  second  parameter  of  the  form 
Xn  for  each  record  variable  name  of  the  form  Xn  in 


parameter  (i). 

Example  The  criterion  'CRITEX2' ,  described  in  the  Example  of 

Section  2.2.2. 3,  can  also  be  described  by  stating  that:  it 
must  not  be  the  case  that  there  is  a  single  record  of  the  type 
'PERSRCD'  which  contains  a  value  of  'SURNAME'  that  is  greater 
than  the  value  of  'SURNAME'  in  the  head  record  and  less  than  the 
value  of  'SURNAME'  in  the  tail  record: 

CRITERION  (  'CRITEX2-ALT', 

NOT  (  SOMEOCC  (  XI j 

(  (  'SURNAME'  QF  OCC  (  'PERSRCD',  XI  )  ) 

LT  (  ’SURNAME'  OF  OCC  (  'PERSRCD',  T  )  ) 

AND  (  (  'SURNAME'  OF  OCC  (  'PERSRCD',  B  )  ) 

LT  (  ’SURNAME'  CF  OCC  (  'PERSRCD',  Xl  )  )  )  )  )  ) 

2.2  LINK  Statement 

LINK  (  link  name;  head  record  name,  tail  record  name; 
criterion  name,  implementation;  link  number,  link 
uniformity  ) 

(i)  link  name  is  an  unindexed  user-defined  name. 

(ii)  head  record  n/tiw»  is  an  unindexed  user-defined  name. 

(iii)  tail  record  name  is  an  unindexed  user-defined  name. 

(iv)  criterion  name  is  an  unindexed  user-defined  name. 


format 


param¬ 

eters 
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(v'  implementation  is  either  the  system  name: 

SEQUEN;  or  the  string 

EMBED,  pointer  name  [;  P,  length]  [;  R,  count] 
where  pointer  name  is  an  unlndexed  user-defined 
name,  and  length  and  count  are  strings  of  the 
form  n,  where  n  is  an  integer. 

DIREC,  pointer  name,  file  name 

where  pointer  name  and  file  name  are  un indexed 
user-defined  names. 

(vi)  link  number  is  either  the  string  n,  where  n  is  an 
integer,  or  the  system  name  NOLIM. 

(vii)  link  uniformity  is  either  the  system  name: 

FIXED  (or  simply  F),  or 
VARIABLE  (or  simply  V) . 

usage  The  LINK  statement  is  used  to  specify  a  type  of  direct 

of 

the  access  path.  That  is,  it  specifies  the  criterion,  which 

state¬ 
ment  determines  when  direct  access  paths  are  to  exist,  and  the 

implementation  of  these  paths  in  terms  of  the  following 

parameters: 

(i)  link  name.  This  parameter  gives  the  name  for  direct 
access  paths  of  the  type  being  specified. 

(ii)  head  record  name.  This  parameter  is  used  to  name  the 
record  type  for  the  head  record  of  the  direct  access 
path.  The  name  must  appear  as  the  first  parameter  in 
a  RECORD  statement. 
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(iii)  tail  record  name.  This  parameter  is  used  to  name  the 
record  type  for  the  tail  record  of  the  direct  access 
path.  The  name  must  appear  as  the  firsi  parameter  in 
a  RECORD  statement . 

(iv)  criterion  name.  This  parameter  is  used  to  name  the 
criterion  which  determines  when  direct  access  paths 
are  to  exist  1 etween  two  records.  The  name  must  appear 
as  the  first  parameter  in  a  CRITERION  statement. 

(v)  implementation.  This  parameter  is  used  to  specify  the 
implementation  of  the  direct  access  paths.  Implementa¬ 
tion  is  given  by  the  system  name: 

SEQUEN  -  when  the  tail  record  is  to  be  stored 
sequentially  after  the  head  record  of  the  path. 

EMBED,  pointer  name  [;  P;  length]  [;  B;  count! 
when  there  is  to  be  a  pointer  to  the  tail  record 
stored  in  the  head  record. 

The  parameter,  pointer  name,  gives  the  name 
of  the  pointer.  This  name  must  appear  as  the  first 
parameter  in  a  POINTER  statement  (see  Section  3. 3). 

The  optional  parameter,  length,  is  used  when  a 
maximum  is  to  be  placed  on  the  length  of  any  path 
implemented  by  the  pointers. 

The  optional  parameter,  count,  is  used  when  a 
maximum  1b  to  be  placed  on  the  number  of  records 
linked  together  by  the  pointers. 
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DIREC,  pointer  name,  file  name  -  when  a  pointer 
to  the  tail  record  is  to  be  stored  in  a  table 
after  a  pointer  to  the  head  record. 

Hie  parameter,  pointer  name,  gives  the  name 
of  the  pointer.  This  name  must  appeer  as  the  first 
parameter  in  a  P0INH2R  statement  (see  Section  3*3)* 

Hie  table  must  be  described  as  a  set  of  records 
structured  in  some  way  by  access  paths.  The  param¬ 
eter,  file  name,  names  the  specification  of  the 
table.  Hiis  name  must  appear  as  the  first  param¬ 
eter  in  a  FILE  statement  (see  Section  2.3). 

(vi)  link  number.  Hiis  parameter  specifies  the  number 
of  tail  records  that  may  be  linked  by  direct  access 
paths  of  the  type  currently  specified  to  each  head 
record. 

Hie  parameter  must  be  assigned  the  string  n, 
when  either  exactly  n  tail  records,  or  a  maximum  of 
n  +o,il  records,  are  to  be  linked  to  each  head  record. 

The  parameter  is  assigned  the  system  name  NOLIM, 
when  there  is  no  limit  on  the  number  of  tail  records 
linked  to  each  head  record. 

(vii)  link  uniformity.  This  parameter  specifics  whether 
the  link  number  of  parameter  (vi)  gives  the  exact 
number  of  tall  records  linked  to  each  head  record. 
Hie  parameter  is  assigned  the  system  name: 
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Example 


FIXED  (or  F)  when  there  may  be  a  fixed  number 
of  tall  records  linked  to  each  head  record.  Li 
this  case  link  number  gives  the  exact  number. 

VARIABLE  (or  V)  when  there  may  be  a  variable  number 
of  tall  records  linked  to  each  head  record.  In 
this  case  the  link  number  gives  the  maximum  num¬ 
ber  of  the  tall  records. 

Iba  following  statement  specifies  a  type  of  direct  access 
path  (called  'LINK1').  Biese  paths  link  records  of  type 
'PERSRCD'  (as  specified  in  Example  1  of  Section  1.3)  and  are 
determined  by  the  criterion  'CRIHX31  specified  in  the  example 
of  Section  2.1. 3. 3.  The  statement  determines  a  list  structure 

for  the  records  which  is  implemented  by  sequential  stroage. 

. . . .  ■»  1  ■■  . . .  ■■  ■  — ■  -  ■  > 

LINK  (  'LINK1';  'PERSRCD' ,  'PERSRCD'; 

'CRITEX3',  SEQUEN;  1,  FIXED  ) 
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2.3  FILE 


To  mat 


param¬ 

eters 


usage 

oi‘ 

the 

state¬ 

ment 


usage 

of 
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param¬ 

eters 


Statement 

FILE  (  rile  name;  link  name,  link  name; 

storage  specification  name;  device  specification  name,  ..., 
device  specification  name. 

(i)  file  name  is  an  unindexed  user-defined  name. 

(ii)  link  name  is  an  unindexed  user-defined  name. 

(iii)  storage  specification  name  is  an  unindexed  user- 
defined  name. 

(iv)  device  specification  name  is  an  unindexed  user-defined 
name. 


The  FILE  statement  is  used  to  specify  a  file  by  giving 
the  different  types  of  access  pathG  that  exist  among  the 
records  and  by  relating  this  specification  with  a  particular 
storage  structure .  A  separate  FILE  statement  must  be  pro¬ 
vided  for  each  set  of  records  that  is  to  be  treated  as  a  table 
for  pointers  (see  Section  2.2,  parameter  (iv)  -  the  DIREC 
option) . 

(i)  file  name.  This  parameter  gives  the  name  used  in 
referring  to  the  file. 

(ii)  link  name,  ...,  link  name.  These  parameters  give 

the  names  of  the  access  path  which  can  exist  in  the 
file.  Each  name  must  appear  as  the  first  parameter 
in  a  LINK  statement.  There  must  be  at  least  one  LINK 
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statement  for  each  record  type  which  specifies  a 
criterion  for  sequencing  that  type  of  record  relative 
to  one  of  the  other  types  in  the  file. 

(lii)  storage  specification  name.  Biis  parameter  specifies 
which  storage  structure  is  associated  with  the  file, 
(iv)  device  specification  name,  ...,  device  specification 
name.  Ihese  parameters  specify  the  devices  on  which 
the  file  is  to  be  stored. 


3*  Storage  Specification  Statements 

Storage  devices  are  organized  by  arranging  certain  storage  units 
into  a  hierarchy,  so  that  the  storage  unit  at  each  level  of  the  hierarchy 
(with  the  exception  of  the  lowest  level)  can  be  decomposed  into  various 
storage  units  at  the  next  lower  levels.  For  example,  the  magnetic  disk 
consists  of  the  following  storage  units:  physical  blocks,  tracks,  cyl¬ 
inders,  and  disk  packs.  These  storage  units  are  arranged  in  a  hierarchy 
in  that  physical  blocks  are  organized  into  tracks,  tracks  are  organized 
into  cylinders,  and  cylinders  are  organized  into  a  particular  disk  pack. 

5he  following  terminology  will  be  used  in  presenting  the  Storage 
Specification  statements: 

(i)  A  basic  block  is  the  storage  unit  at  the  lowest  level  of  a 
storage  device  (e.g.,  the  physical  blocks  of  a  disk).  They  consist  of 
storage  positions  such  as  card  columns  on  a  punched  card,  bits  on  a  disk, 
or  7  or  9-bit  characters  on  a  tape. 

(ii)  A  block  is  the  storage  unit  at  any  of  the  higher  levels 
of  a  storage  device  (e.g.,  the  tracks  or  cylinders  of  a  disk).  Thus,  a 
block  is  an  organization  of  basic  blocks  and/or  other  blocks.  We  call 
this  organization  of  blocks  the  storage  structure. 

(iii)  Two  blocks  or  basic  blocks  are  of  the  same  type  if  and  only 
if  they  have  the  same  structure  etnd  implementation. 

Ihc  Storage  Specification  statements  are  used  to  specify; 

(1)  the  storage  structure  for  a  file, 

(2)  the  implementation  of  this  storage  structure, 

(3)  the  positioning  of  the  recordB  of  the  file  within  the  storage 
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structure,  and 

(4)  the  form  of  the  record  addresses  which  are  to  he  used  as 
pointers  in  the  file. 


3-1  BBLOCK  Statements 


format 


BBLOCK  (  basic  block  name;  length,  uniformity; 

record  count,  basic  block  count,  count  uniformity 
[;  SPLIT:  record  name,  . ..,  record  name  ] 

[;  START:  record  name,  ...,  record  name  ] 

[;  HDR:  header  ]  ...  [;  HDR:  header  ] 

[;  TLR:  trailer  ]  ...  [;  TLR:  trailer  ]  ) 


param¬ 

eters 


(i)  basic  block  name  is  an  unindexed  user-defined  name. 

(ii)  length  is  either  the  string  n,  where  n  is  an  integer, 
or  the  system  name  NOLIM. 

(iii)  length  uniformity  is  either  the  system  name: 

FIXED  (or  8 imply  F),  or 
VARIABLE  (or  simply  V) . 

(iv)  record  count  is  either  the  string  n,  where  n  is  an 
Integer,  or  the  system  name  NOLIM. 

(v)  basic  block  count  is  either  the  string  n,  where  n  is 
an  Integer,  or  the  system  name  NOLIM. 

(vi)  count  uniformity  is  either  the  system  name: 

FIXED  (or  simply  F),  or 
VARIABLE  (or  simply  V) . 

(vii)  and  (viii)  record  name  is  an  unindexed  user-defined  name, 
(ix)  header  is  either  a  CONSTANT  statement  with  the  format 

described  in  Section  lJi.<2,  or  an  unindexed  user-defined 
name. 
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(x)  trailer  is  either  a  CONSTANT  statement  with  the 
format  described  in  Section  1.4.2,  or  an  unindexed 
user-defined  name. 

The  BBLOCK  statement  is  used  to  specify  a  type  of  basic 
block  in  the  storage  structure  and  the  positioning  of  records 
within  that  type  of  basic  block.  The  statements  of  Section 
3.4  determine  whether  such  basic  blocks  are  to  be  interpreted 
as  a  punched  card,  physical  tape  block,  etc. 

(i)  oasic  block  name.  This  parameter  gives  the  name  for 
basic  blocks  of  the  type  being  specified. 

(ii)  length.  This  parameter  gives  the  number  of  storage 

positions  in  a  basic  block.  The  parameter  is  assigned 
the  string: 

n,  en  the  length  of  the  basic  block  is  to  be  at 
most  n  storage  positions;  and. 

NOLIM,  when  there  is  to  be  no  limit  on  the  length  of 
the  basic  block. 

(iii)  length  uniformity.  This  parameter  specifies  whether 

every  basic  block  of  the  type  being  specified  is  to  have 
the  length.  The  parameter  is  assigned  the  system 

name: 

FIXED  (or  F),  when  the  length  of  all  basic  blocks  are 


the  same;  and 
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VARIABLE  (or  V) ,  when  the  length  of  each  bade  block 
io  to  differ.  In  this  case,  the  length  gives  tne 
maximum  number  of  ctorage  positions  in  each  bade 
block . 

(iv)  record  count,  and 

(v)  basic  block  count.  These  parameters  specify  the 
number  of  records  stored  in  a  basic  block  as  a  ratio 
of  records  per  basic  block.  The  two  parameters  are 
assigned  the  strings: 

n,  m  -  when  n  records  (or  at  most  n  records)  are  to 
be  placed  in  m  basic  blocks. 

1,  NOLIM  -  when  single  records  of  varying  length 

may  exceed  the  length  of  a  basic  block,  and  each 
record  is  to  begin  in  a  new  basic  block. 

NOLIM,  1  -  when  there  is  no  limit  on  the  number  of 
records  to  be  placed  in  a  single  basic  block. 

(vi)  count  uniformity.  This  parameter  specifies  whether 
the  number  of  records  per  basic  block  Is  fixed.  The 
parameter  must  be  assigned  the  string: 

FIXED  (or  F) ,  when  the  number  of  records  per  block  io 
fixed,  and 

VARIABLE  (or  V),  when  the  number  of  records  per  blocx. 
is  to  vary.  In  thic  case,  record  count  gives  the 
maximum  number  of  records. 
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(vii)  SPLIT:  record  name,  . ..,  record  name.  These  param¬ 
eters  specify  those  record  types  which  may  be  split 
between  basic  blocks  when  there  are  not  enough  storage 
positions  for  an  entire  record.  Each  record  name  must 
appear  as  the  first  parameter  in  a  RECORD  statement, 
(viii)  START:  record  name,  . ..,  record  name.  These  param¬ 
eters  are  used  to  specify  those  record  types  which  may 
occur  first  in  a  basic  block.  Each  record  name  must 
appear  as  the  first  parameter  in  a  RECORD  statement. 

If  no  record  names  are  specified,  the  basic  block  is 
to  contain  no  records  of  the  file. 

(ix)  HDR:  header.  This  optional  pararce.er  is  used  to  specify 
control  information  which  occurs  at  the  head  of  a  basic 
block.  The  statements  describing  the  individual  devices 
(see  Section  3.4)  specify  whether  the  header  is  to  be 
interpreted  as  a  start  card  for  a  card  deck,  or  a  label 
for  a  tape,  etc.  The  parameter  iB  assigned: 
a  CONSTANT  statement,  when  the  character  string  con¬ 
tained  in  the  header  is  to  be  given  directly,  and 
a  user-defined  name,  which  refers  to  a  group  or  field 
type,  when  the  character  string  content  is  not  given 
directly.  The  group  or  field  type  specification  is 
used  to  specify  characteristics  such  as  length  and 
delimiters  which  can  be  used  to  identify  the  header 
when  this  is  necessary.  Although  a  GROUP  or  FIELD 
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'  statement  may  be  used,  to  describe  a  header,  it  is 
not  a  group  or  field  in  the  sense  that  these  are 
components  of  records.  In  this  instance  these 
statements  are  used  in  a  second  way  which  should 
not  be  confused  with  their  primary  use  in  record 
specification.  This  parameter  is  only  used  when 
describing  the  structure  of  a  medium  which  provides 
labels  at  the  lowest  level. 

(x)  TLR:  trailer.  Ibis  optional  parameter  is  used  to 

specify  control  information  which  occurs  at  the  tail 
of  a  basic  block.  It  is  specified  and  used  in  the 
same  way  as  the  header  parameter,  described  as  param¬ 
eter  (ix),  above. 

Examples  of  the  use  of  the  BBLOCK  statement  will  be 
given  in  the  section  discussing  the  individual  devices. 
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3.2  BLOCK  Statements 

format  BLOCK  (  block  name  [,  addressing  scheme  3; 

(  list  ),  . ..,  (  list  )  [;  HDR:  header  ]  ...  [;  KDR:  header  ] 
[;  TLR:  trailer  ]  ...  [;  TLR:  trailer  ]  , 

param-  (i)  block  name  is  an  unindexed  user-defined  name, 

eters 

(ii)  addressing  scheme  is  a  string  of  parameters  of  the 
form; 

address  length,  address  order,  base,  start  address 
where: 

a)  address  length  is  a  string  of  the  form 
n,  where  n  if  n  integer; 

b)  base  is  a  string  of  the  form 
r,  where  n  is  an  integer; 

c)  start  address  is  a  CONSTANT  statement  with  the 
format  described  in  Section  l.h.2. 

(iii)  list  is  a  string  of  parameters  of  the  form: 

b/block  name,  occurrence,  repetition  number,  repetition 
uniformity  [,  address  level,  address  scope  ] 
where : 

a)  b/block  name  is  an  unindexed  user-defined  name; 

b)  occurrence  is  either  the  system  name; 


MANDATORY  (or  simply  M),  or 
OPTIONAL  ^or  simply  o); 
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c)  repetition  number  is  either  the  string  n,  where 
n  is  an  integer,  or  the  system  name  NQLIM; 

d)  repetition  uniformity  is  either  the  system  name: 
FIXED  (or  simply  F) ,  or 

VARIABLE  (or  simply  V); 

e)  address  level  is  either  the  system  name  CL,  or  NL; 

f)  address  scope  iB  either  the  system  name  WOT,  or  WT. 
(iv)  header  is  either  a  CONSTANT  statement  with  the  format 

described  in  Section  1.4.2,  or  an  unindexed  user-defined 
name. 

(v)  trailer  is  either  a  CONSTANT  statement  with  the  format 
described  in  Section  1.4.2,  or  an  unindexed  user-defined 
name. 


The  BLOCK  statement  is  used  to  specify  a  type  of  block 

in  the  storage  structure.  Hiat  is,  it  specifies  the  structure 

/  * 

and  implemenation  of  a  set  of  basic  blocks  and/or  other  blocks 
in  terms  of  the  following  parameters: 

(i)  block  name.  Hi  is  parameter  gives  the  name  for 
blocks  of  the  type  being  specified. 

(ii)  addressing  scheme.  Hiis  parameter  is  used  to  speci¬ 
fy  the  addressing  scheme  associated  with  the  basic 
blocks  and/or  subordinate  blocks  In  the  block  being 
specified  in  termB  of  the  following  parameters: 


usage 

of 

the 

state¬ 

ment 


Blocks  included  in  other  blocks  are  called  subordinate  blocks. 
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a)  address  length.  This  parameter  specifies 
the  length  of  the  bit  string  needed  to 
represent  the  address; 

b)  base.  This  parameter  gives  the  base  of  the 
number  system  in  which  addresses  are  to  be 
given. 

c)  start  address.  This  parameter  gives  the 
address  to  be  used  for  the  first  block  or 
basic  block  to  which  the  scheme  is  applied. 

(ill)  (list),  (list).  Each  list  specifies  how  a 

basic  block  or  subordinate  block  is  to  be  imple¬ 
mented  as  part  of  the  block  being  specified.  The 
order  in  which  the  basic  blocks  and  subordinate 
blocks  ar  listed  gives  the  order  in  which  they 
are  to  occur. 

a)  name.  This  parameter  names  the  type  of  a 
basic  block  or  subordinate  block. 

b)  occurrence.  This  parameter  specifies  whether 
basic  blocks  or  subordinate  blocks  of  the  type 
named  by  parameter  (a)  are  optional  or  manda¬ 
tory.  Ihe  parameter  is  assigned  the  system 
name; 

MANDATORY  (or  M),  when  the  basic  blocks  or 
subordinate  blocks  of  the  type  named  must 
occur  in  each  block,  and 
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OPTIONAL  (or  0) ,  when  the  basic  blocks  or 
subordinate  blocks  are  not  mandatory. 

c)  repetition  number.  This  parameter  specifies 
the  number  of  basic  blocks  or  subordinate 
blocks  of  the  type  named  by  parameter  (a) 
are  to  occur  in  each  block.  The  parameter 
is  assigned  the  string: 

n,  when  at  most  basic  blocks  or  subordinate 
blocks  may  occur,  and 
NOLIM,  when  any  number  may  occur. 

d)  repetition  uniformity.  This  parameter  specifies 
whether  the  parameter,  repetition  number,  gives 
the  exact  or  the  maximum  number  of  basic  blocks 
or  subordinate  blocks.  The  parameter  is 
assxgned  the  system  name; 

FIXED  (or  F),  when  the  repetition  number  gives 
the  exact  number,  and 

VARIABLE  (or  V),  when  it  gives  the  maximum  num¬ 
ber. 

e)  address  level.  This  parameter  specifies 
whether  the  addressing  scheme  given  by 
parameter  (ii)  is  for  blocks  of  the  type 
named  by  parameter  (a),  or  to  basic  blocks  or 
subordinate  blocks  Included  in  t}jat  block. 

'Hie  parameter  is  assigned  the  system  name; 
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CL,  when  a  basic  block  type  is  named  by 
parameter  (a)  or  a  subordinate  block  type 
is  named  and  the  addressing  scheme  is  to 
bs  applied  to  blocks  of  these  type;  and 
NL,  when  the  addressing  scheme  is  to  be 
applied  to  the  basic  biockB  at  the  next 
level. 

f)  address  scope.  Ihis  parameter  specifies 
how  the  basic  blocks  or  subordinate  blocks 
of  the  type  named  by  parameter  (a)  are  to 
be  addressed,  relative  to  other  basic  blocks 
or  subordinate  blocks  in  the  block.  The 
parameter  is  assigned  the  system  name: 

WOT,  when  the  type  named  is  addressed  in 

sequence  with  the  other  types  in  the  block, 
and 

WT,  when  it  is  to  be  addressed  cnly  within 
the  type  named  by  parameter  (a) . 

(iv)  HDR:  header.  Ihis  parameter  is  used  to  specify 
control  information  which  occurs  at  the  head  of 
a  block.  Bie  statements  describing  the  individual 
devices  (see  Section  3**0  specify  whether  the 
header  is  to  be  interpreted  as  »*  ir.bol  for  a 
track,  etc.  Ihe  usage  of  this  parameter  is 
usscribed  In  Section  3-l>  parameter  (ix). 
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(v)  TLR:  trailer.  Oils  parameter  is  used  to  specify 
control  information  which  occurs  at  the  end  of 
a  block.  Its  usage  is  described  in  Section  3*1, 
parameter  (x). 

Examples  of  the  use  of  the  BLOCK  statement  will  be  given 
in  the  section  discussing  the  individual  devices. 


3-3  POINTER  Statements 


format  POINTER  (  pointer  name;  pointer  type;  pointer  mode; 
pointer  form  ) 


param-  (i)  pointer  name  is  an  unindexed  user- defined  name, 

eters 

(ii)  pointer  type  is  either  the  system  name: 

MAIN,  or 
DEVICE. 

(ill)  pointer  mode  is  either  the  system  name: 

ABS,  or  the  string 
KEL,  origin 

where  origin  is  a  reference  name. 

(iv)  pointer  form  is  either  an  unindexed  user-defined  name, 
or  a  string  of  the  form: 

field  name;  b /block  name,  ...,  b/block  name 

where  field  name  and  b/block  name  are  unindexed  user- 

deiined  names. 


usage  The  POINTER  statement  is  used  to  specify  the  implementation 

of 

the  of  a  pointer  in  terms  of  the  following  parameters: 

state¬ 
ment 


usage  (i)  pointer  name.  Ihis  parameter  gives  the  name  of  the 

of 

the  type  of  pointer  being  specified, 

param¬ 
eter 
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( ii )  pointer  type.  Th i s  parameter  is  used  to  specify 

whether  the  pointers  give  the  main  memory  addresses 
of  records  (for  files  to  be  used  in  main  memory)  or 
the  device  addresses  of  records.  The  parameter  is 
assigned  the  system  name: 

MAIN,  when  the  pointers  are  to  give  main  memory 
addresses,  and 

DEVICE,  when  the  pointers  are  to  give  device  addresses, 

(iii)  pointer  mode.  This  parameter  is  used  to  specify 

whether  pointers  give  the  absolute  address  of  records, 
or  the  address  relative  to  some  record  or  block.  Hie 
parameter  is  assigned  the  system  name: 

ABS,  when  pointers  are  to  give  the  absolute  address 
of  records,  and 

EEL,  origin  when  pointers  are  to  give  addresses  rela¬ 
tive  to  some  record,  when  the  pointers  give  main 
memory  addresses,  or  relative  t-o  some  block  when 
the  pointers  give  device  addresses. 

Hie  parameter,  origin,  gives  the  record  or 
block  type  to  be  used  as  the  origin. 

(iv)  pointer  form.  Ihis  parameter  gives  the  form  of  a 

pointer  in  terms  of  the  field  in  which  It  is  stored, 
and,  in  the  case  oT  a  pointer  g_ving  a  device  address, 
the  addressing  schemes  for  each  device  level  used  to 
form  the  pointer.  The  parameter  is  assigned: 
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Example 


an  unindexed  user- defined  name,  when  the  pointers 
give  main  memory  addresses.  This  identifies 
the  field  which  is  to  contain  the  pointer  and  must 
appear  as  the  first  parameter  in  a  FIELD  statement; 
a  string  of  the  form: 
field  name;  block  name,  ...,  block  name 

when  the  pointers  give  device  addresses.  ftie  field 
name  names  the  field  which  is  to  contain  the  pointer. 
The  block  names  refer  to  the  BLOCK  statements  in 
which  the  addressing  schemes  are  specified  for  each 
block  address  which  is  to  make  up  the  total  address 
of  a  record. 

Consider  a  file,  stored  on  disk,  whose  records  are 
connected  by  embedded  pointers  (called  'PTR')  which  give  the 
absolute  cylinder  and  track  locations  of  the  records.  Assume 
the  block  type  for  cylinders  is  ’CYL’  and  the  block  type  for 
tracks  is  ' TRK'.  The  following  statement  specifies  the  imple¬ 
mentation  of  these  pointers  for  the  file; 

jppINTER  (  'PTR';  DEVICE;  ABS;  'FIELD!1;  'CYL',  'TRK'  ) 
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3.4  Device  Statements 

Device  statements  are  used  to  relate  storage  structure  specifica¬ 
tions  to  particular  devices.  The  Device  statements  associate  with  each 
physical  level  of  ':‘he  device  those  block  and  basic  block  types  which 
correspond  to  that  level.  The  Device  statements  are  also  used  to  specify 
other  medium  dependent  characteristics  such  as  unit  assignment  and  tape 
density  for  magnetic  tape.  There  is  one  Device  statement  for  each  dif¬ 
ferent  kind  of  device. 

3-4.1  CARD  Statements 


format  CARD  (  card  specification  name  [,  unit  assignment  name  ]; 

DECK:  block  name;  CARD:  bblock  name,  ...,  bblock  name  ) 

param-  (i)  card  specification  name  is  an  unindexed  user-defined 

eters 

name. 

(ii)  unit  assignment  name  is  a  CONSTANT  statement  with 
the  format  described  in  Section  1.4.2. 

(iii)  block  name,  and 

(iv)  bblock  names  are  unindexed  user-defined  names. 

usage  The  CARD  statement  is  used  to  relate  block  and  basic 

of 

the  block  types  of  a  storage  structure  to  the  medium  of  punched 

state¬ 
ment  cards  in  terns  of  the  following  parameters; 

usage  (i)  card  specif icetion  name.  This  parameter  is  used 

of 
the 

param¬ 
eters 


to  refer  to  the  relation  being  specified  between  a 
storage  structure  and  the  medium  of  cards. 
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(ii)  unit  assignment  name.  This  parameter  is  used,  to  specify 
the  name  of  the  device  unit  to  be  used  (if  selection 
is  allowed) .  The  unit  should  be  identified  using  the 
naming  scheme  of  the  operating  system.  If  no  mme  is 
specified,  system  conventions  are  used  to  select  the 
particular  card  i/o  device  to  be  used. 

(ill)  DECK:  block  name.  This  parameter  is  used  to  identify 
the  block  type  in  the  storage  structure  which  corres¬ 
ponds  to  the  entire  deck  of  cards.  The  block  name 
must  appear  as  the  first  parameter  in  a  BLOCK  state¬ 
ment.  Headers  and  trailers  specified  in  this  BLOCK 
statement  are  interpreted  as  single  cards. 

(iv)  CARD:  bblock  name,  ...,  bblock  name.  Ihese  parameters 
are  used  to  identify  those  basic  block  types  in  the 
storage  structure  which  correspond  to  single  punched 
cards.  Each  bblock  name  must  appear  as  the  first 
parameter  in  a  BBLOCK  statement  with  the  following 
parameters: 

a)  parameter  (ii)  is  80. 

b)  parameter  (iii)  is  FIXED  (or  F) . 

c)  parameters  (ix)  and  (x)  do  not  appear. 
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Example  Hie  following  statements  relate  a  storage  structure 

to  the  medium  of  punched  cards  for  a  file  which  is  to  contain 
records  of  a  type  'X'  having  a  fixed  length  of  l60  characters 
and  stored  1  record  to  2  cards,  Hiere  is  only  one  type  of 
basic-block  in  the  storage  structure  and  one  header  and 
trailer  card.  Hie  header  card  contains  the  string:  START 
OF  DA®.,  starting  in  column  1.  The  trailer  card  contains 
the  string:  END  OF  DATA,  starting  in  column  1. 

CARD  (  1 INRJT  CARDS'; 

DECK:  'CARD-DECK';  CARDS:  'CARD'  ) 

BLOCK  (  'CARD-DECK';  SPEC; 

(  'CARD',  M,  NOLIM,  V  ); 

riDR:  CONSTANT  (  START  OF  DATA,  ASCII  ); 

TLR:  CONSTANT  (  END  OF  DATA,  ASCII  )  ) 

BBLOCK  (  'CARD';  80,  FIXED;  1,  2,  FIXED; 

START:  'X'  ) 

3.4.2  TAPE  Statements 

format  TAPE  (  tape  specification  name  [,  D:  density  ] 

[,  unit  assignment  name  ]; 

TAPE:  block  name;  TAPE  BLOCK: 
bblock  name ,  . . . ,  bblock  name  ) 
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param¬ 

eters 


usage 

of 

the 

state¬ 

ment 


usage 

of 

the 

param¬ 

eters 


(l)  tape  specification  name  is  an  unindexed  user-defined 
name. 

(ii)  density,  and 

(iii)  unit  assignment  name  are  CONSENT  statements  with  the 
format  described  in  Section  1.4.2. 

(iv)  block  name,  and 

(v)  bblock  names  are  unindexed  user-defined  ne'ues. 

The  TAPE  statement  is  used  to  relate  block  and  basic 
block  types  of  a  stroage  stricture  to  the  medium  of  magnetic 
tape  in  terms  of  the  following  parameters: 

(i)  tape  specification  name.  This  parameter  is  used  to 

refer  to  the  relation  being  specified  between  a  storage 
structure  and  the  medium  of  magnetic  tape. 

(ii)  density.  This  parameter  is  used  to  specify  tape 

density  for  tape  units.  The  CONSTANT  statement  is  used 
to  give  the  density  in  the  form  required  by  the  operating 
system. 

(iii)  unit  assignment  name.  This  parameter  is  used  to  specify 
the  particular  tape  derive  to  be  used.  The  unit  should 
be  specified  using  the  CONSTANT  statement  according  to 
the  naming  scheme  of  the  operating  system. 

(iv)  TAPE:  block  name.  This  parameter  identifies  the  block 
type  in  the  storage  structure  which  corresponds  to  an 
entire  magnetic  tape.  The  block  name  must  appear  as 
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Example 


the  first  parameter  in  a  BLOCK  statement.  Headers 
and  trailers  specified  in  this  BLOCK  statement  are 
interpreted  as  single  tape  blocks. 

(v)  TAPE  BLOCK:  bblock  name,  bblock  name.  These 

parameters  are  used  to  identify  those  basic  block  types 
in  the  storage  structure  which  correspond  to  single 
physical  tape  blocks.  Each  bblock  name  must  appear 
as  the  first  parameter  in  a  BBLOCK  statement.  Headers 
and  trailers  specified  in  this  BBLOCK  statement  are 
interpreted  as  the  beginning  or  ending  storage  posi¬ 
tions  in  the  physical  tape  block. 

'Hie  following  statements  relate  a  storage  structure  to 
a  magnetic  tape.  The  storage  structure  is  to  contain  a  file 
stored  in  fixed  length  physical  blocks  of  2064  bytes.  Each 
physical  block  is  to  contain  25  fixed  length  records  of  type 
'X'.  The  file  is  to  be  preceded  by  two  80  byte  ASCII  header 
labels,  'STANDARD  VOLUME  LABEL'  and  "STANDARD  FILE  LABEL', 
and  a  1  byte  ASCII  tapenark  '  TM';  and  followed  by  a  single 
80  byte  ASCII  label  'STANDARD  FILE  TLR  LABEL'  and  two  2  byte 
ASCII  tapemarks  'TM'.  Each  physical  block  is  to  be  preceded 
by  a  16  bute  ASCII  header  'KEY1. 


TAPE  (  '  TAFE1 ' ;  TAPE :  ' FILE- BLOCK ' ; 
TAPE  BLOCK:  'TAPE-BLOCK'  ) 
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BLOCK  (  'FILE-BLOCK’;  SPEC; 

(  'TAHE-BIDCK',  M,  NOLIM,  V  ); 

HDR:  'Sffi.NDA.RD  VOLUME  IABEL'; 

HDR:  'STANDARD  FILE  IABEL' ; 

HDR:  'IM'j 

HR:  'STANDARD  FILE  TLR  LABEL'; 

TLR:  'Dl'; 

TLR:  '  jJi'  ' 

BBLOCK  (  'ffiPE-BLOCK' ;  20 6k,  FIXED; 

25,  1,  FIXED;  START:  'Y ' ; 

HDR:  'KEY'  ) 

FIELD  (  'STANDARD  VOLUME  LABEL',  ASCII,  C,  80,  F,  C  ) 
FIELD  (  'STANDARD  FILE  LABEL',  ASCII,  C,  80,  F,  C  ) 
FIELD  (  '7M',  ASCII,  C,  1,  F,  C  ) 

FIELD  (  'STANDARD  FILE  TLR  LABFT  ' ,  ASCII,  C,  80,  F,  C  ) 
FIELD  (  'KEY' ,  ASCII,  C,  18,  F,  C  ) 

3^.3  DISK  Statements 


format 


DISK  (  disk  specification  name  £,  unit  assignment  name  ]; 
DISK:  block  name;  CYLINDER:  cylinder  name,  ...,  cylinder 
name;  QUACK;  track  name,  ...,  track  name; 

TRACK  BLOCK:  bblock  name,  ...,  bblock  name  ) 
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param¬ 

eters 


usage 

of 

the 

state¬ 

ment 


usage 

of 

the 

param¬ 

eters 


(l)  disk  specification  name  Is  an  unindexed  user-defined 
’•'vme. 

(ii)  unit  assignment  name  is  a  CONSTANT  statement  with  the 
format  described  in  Section  1.4.2. 

(ill)  block  name, 

(iv)  cylinder  name, 

(v)  track  name,  and 

(vi)  bblock  name  are  unindexed  user-defined  na.^s. 

The  DISK  statement  is  used  to  relfite  block  and  basic 
block  types  of  a  storage  structure  to  the  medium  of  a  disk 
in  terms  of  the  following  parameters: 

(i)  disk  specification  name.  This  parameter  is  used  to 

refer  to  the  relation  being  specified  between  a  storage 
structure  and  the  medium  of  a  disk. 

(ii)  unit  assignment  name.  This  parameter  specifies  the 
particular  disk  drive.  The  unit  should  be  specified 
using  the  CGNS3MT  statement  according  to  the  naming 
scheme  of  the  operating  system. 

(iii)  DISK:  block  aame.  This  parameter  identifies  the 

block  type  in  the  storage  structure  which  corresponds 
to  an  entire  disk.  The  block  name  must  appear  as  the 
first  parameter  in  a  BLOCK  statement.  Header  and 
trailer  parameters  are  Interpreted  as  single  track 
blocks  at  the  beginning  end  end  track  in  the  disk. 
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Example 


(iv)  CYLINDER:  cylinder  name,  . ..,  cylinder  name,  These 
parameters  identify  those  block  types  in  the  storage 
structure  which  correspond  to  single  cylinders.  Each 
name  must  appear  as  the  first  parameter  in  a  BLOCK 
statement.  Header  and  trailer  parameters  are  inter¬ 
preted  as  single  trac  •  blocks  at  the  beginning  and  end 
tracks  in  the  cylinder. 

(v)  TRACK:  track  name,  ...,  track  name.  These  parameters 
identify  those  block  types  in  the  storage  structure 
which  correspond  to  single  tracks.  Each  name  must 
appear  as  the  first  parameter  in  a  BLOCK  statement. 
Header  and  trailer  parameters  are  interpreted  as  single 
track  blocks  at  the  beginning  and  end  of  the  track. 

(vi)  TRACK  BLOCK:  bblock  name,  ...,  bblock  name.  These 

parameters  are  used  to  identify  those  basic  block  types 
in  the  storage  structure  which  correspond  to  single 
physical  track  blocks.  Each  name  must  appear  as  the 
first  parameter  in  a  BBLOCK.  Header  and  trailer 
parameters  are  interpreted  as  Gingle  track  blocks 
preceding  and  following  the  track  block  in  question. 

The  following  statements  relate  a  ctorage  structure  to 
the  disk  medium  for  a  file  of  fixed  length  records  of  type 
'X'.  The  records  are  to  be  positioned  in  fixed  length  track 
blocks  which  are  the  size  of  a  track  (74,000  bitr)  and  there 
are  to  be  90  records  per  track  block.  There  is  to  be  only  one 


type  of  track  block  and  cylinder  block.  No  labels  are  to 
appear. 

DISK  (  'DISK1' ;  DISK:  'DISK-BLOCK'; 

CYLINDER:  1  CYLINDER- BLOCK ' ; 

TRACK:  ’TRACK-BLOCK'; 

TRACK- BLOCK:  ' TRACK- BBLOCK'  ) 

BLOCK  (  'DISK-BLOCK';  SPEC; 

(  '  CYLINDER-BLOCK ' ,  M,  NG..IM,  V  )  ) 

BLOCK  (  'CYLINDER-BLOCK',  SPEC; 

(  ' TRACK-BLOCK  ,  M,  SO,  V  )  ) 

BLOCK  (  'TRACK-BLOCK',  SPEC; 

(  'TRACK-BBLOCK',  M,  1,  F  )  ) 

BBLOCK  (  'TiACK-BBLOCK';  7^000,  FIXED; 

90,  1,  F;  START:  'Z'  ) 

TTY  Statements 

format  TTY  (  teletype  specification  name  [,  unit  assignment  name  ]; 

TTY  FILE:  block  name;  PAGE:  page  name,  ...,  page  name; 
LINE:  line  name,  ...,  line  name  ) 

param-  (i)  teletype  specification  name  is  an  unindex. U  user- 

eters 

defined  name. 

(ii)  unit  assignment  name  is  a  CONSTANT  statement  with  the 
format  described  in  Section  1>ji  .2. 

(iii)  block  name, 

(iv)  page  name,  and 


(v)  line  name  are  unindexed  user-defir ?d  names. 


usage 

of 

the 

state¬ 

ment 


Die  T3Y  statement  is  used  to  relate  block  and  basic 
block  types  of  a  storage  structure  to  the  medium  of  tele¬ 
typewriter  input  and  output  in  terms  of  the  following  param¬ 


eters: 


usage 

of 

the 

param¬ 

eters 


(i)  teletype  specification  name.  This  parameter  is  used 
tc  refer  to  the  relation  being  specified  between  a 
storage  structure  and  the  teletypewriter  medium. 

(ii)  unit  assignment  name.  Diis  parameter  specifies  the 
particular  teletypewriter. 

(iil)  TIY  FILE:  block  name.  This  parameter  is  used  to  iden¬ 
tify  the  block  type  in  the  storage  structure  which 
corresponds  to  the  entire  input  or  output  from  the  tele 
typewriter.  Die  block  name  must  appear  as  the  first 
parameter  in  a  BLOCK  statement.  Headers  and  trailers 
specified  in  this  BLOCK  statement  are  interpreted  as 
lines. 

(iv)  PAGE:  page  name,  ...,  page  name.  Diece  parameters  are 
used  to  Identify  those  block  types  in  the  storage  strut 
ture  which  correspond  to  single  pages.  Each  page  name 
must  appear  as  the  firsi;  parameter  in  a  BLOCK  statement 
Header  and  trailer  parameters  are  interpreted  as  lines 
at  the  beginning  and  end  of  a  page. 
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(v)  LINE:  line  name,  . ..,  line  name.  These  parameters 
are  used,  to  identify  those  basic  block  types  in  the 
storage  structure  which  correspond  to  single  lines. 
Each  line  name  must  appear  as  the  first  parameter  in  a 
BBLOCK  statement  with  the  following  parameters; 

a)  parameter  (ii)  is  120. 

b)  parameter  (iii)  is  FIXED. 

c)  parameters  (ix)  and  (x)  are  interpreted  as  columns 
at  the  beginning  and  end  of  lines. 


Example 


The  following  statements  relate  a  storage  structure  to 
the  teletypewriter  mediura  for  a  file  of  fixed  length  records 
of  type  'Q' .  The  records  are  to  be  positioned  1  per  line  and 
there  are  to  be  64  lines  per  page.  There  is  to  be  only  one 
type  of  line  and  page  and  no  headers  or  trailers. 


TTY  (  "Tm1;  TTY  FILE:  '  TTY -BLOCK' ; 
PAGE:  1  TTY-PAGE' ; 

LINE:  'TTY-LINE’  ) 

BLOCK  (  'TTY-BLOCK' ;  SPEC; 

(  'TTY-PAGE',  M,  NQLIM,  V  )  ) 
BLOCK  (  'TTY-PAGE',  SPEC; 

(  ' TTY-LINE' ,  K,  64,  V  )  ) 

BBLOCK  (  'TTY-LINE',  120,  FIXED; 

1,  1,  FIXED;  START:  'Q'  ) 


I 


4.  Conversion  Specification  Statements 

The  following  terminology  will  be  uBed  in  presenting  thf  Con¬ 
version  Specification  statements: 

(i)  Data  conversion  is  a  process  which,  given  a  bit  string 
representation  c."  a  file  (or  set  of  files)  on  a  storage  medium,  pro¬ 
duces  the  bit  string  representation  on  a  (different)  storage  medium  of 
a  file  (or  set  of  files)  whose  fields  contain  values  obtained  from  the 
fields  of  the  first  file  (or  set  of  files).. 

(ii)  Given  a  set  of  files.  X^,  . ..,  XQ  whose  data  are  to  be  con¬ 
verted  into  a  set  of  files  Y^,  we  call  the  files  X^,  ....  Xn 

the  source  files  and  the  files  ,  . ..,  Y^  the  target  files  of  the  con¬ 

version. 

(iii)  Ihe  fields,  groups  and  records  of  a  source  (target)  file  will 
be  called  source  (target)  fields,  source  (target)  groups,  and  source 
(target)  records,  respectively. 

(iv)  An  association  list  is  a  lift  which  gives  for  each  different 
type  of  field  in  the  target  files,  the  source  field  which  is  to  provide 
its  value,  and  any  information  needed  to  locate  the  source  record  con¬ 
taining  the  source  field. 

The  Conversion  Specification  statements  are  used  to  specify  she 
conversion  of  a  set  of  source  files  to  a  set  of  target  files  in  terms 
of  descriptions  of  those  source  and  target  files  and  an  association  list. 
'Ihe  Association  Statements  of  Section  4.1  specify  the  association  list  and 
the  CONVERT  statement  of  Section  4.2  specifies  the  conversion. 
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I.  .  1  Association  Statements 

The  association  list  identifies  for  each  field  type  in  the  target 
file,  a  field  type  in  the  source  file  which  provides  its  value.  For  i. 
given  target  record,  values  may  be  obtained  from  one  or  more  source 
records  and  from  one  or  more  repetitions  of  a  single  field  or  group 
type  in  a  source  record. 

To  refer  to  such  records,  fields  and  groups  unambiguously, 
reference  names  are  modified  in  the  following  way: 

The  record  (field,  group)  name  is  replaced  by  the  SOURCE  state¬ 
ment  of  Section  4.1.2. 

Reference  names  modified  in  this  way  are  also  called  reference 
names.  When  these  names  are  used  in  an  association  list,  they  refer 
to  records  (fields,  groups)  which  have  already  been  used  as  the  source 
of  values  for  target  fields. 

4.1-1  ASSOCIATE  Statements 


format 


param¬ 

eters 


ASSOCIATE  (  association  name;  (  association  entry  ),  ..., 

(  association  entry  )  ) 

(i)  association  name  is  an  unindexed  user-defined  name, 
(ii)  association  entry  is  a  string  of  parameters  of  the 


form: 


target  name,  source  name 

L;  R:  criterion  name]  [;  f/0:  criterion  xjameJ 
where : 


I 
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usage 

of 

the 

state¬ 

ment 


usage 

of 

the 

param¬ 

eters 


a)  target  name  is  a  reference  name. 

b)  source  name  is  a  reference  name. 

c)  and  d)  criterion  name  :'s  an  unindexed  user-defined 

name. 

The  ASSOCIATE  statement  is  used  to  specify  for  each 
field  in  the  target  files,  the  source  field  which  is  to  pro¬ 
vide  its  value  in  terms  of  the  following  parameters: 

(i)  association  name.  3iis  parameter  gives  the  name  to 
be  used  in  referring  to  the  association  list  being 
specified. 

(ii)  (association  entry),  ...,  (association  entry) 

These  parameters  specify  for  each  type  of  target  field, 
a  source  field  which  is  to  provide  the  value.  There 
i6  one  association  entry  for  each  type  of  target  field 
in  each  target  file. 

a)  target  name.  This  parameter  gives  the  type  of 

target  field.  The  name  must  be  modified  by  strings 
of  the  form:  OF  record  name,  and  OF'  file  to 

avoid  ambiguities  bd  to  the  recoi'd  and  file  in 
which  the  target  field  is  tc  occur. 

b)  source  name.  This  parameter  gives  the  type  of 
source  field  which  is  to  provide  the  value  for 
the  target  field.  This  name  must  also  be  modi¬ 
fied  by  strings  of  the  form:  OF  record  name. 
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and.  OF  file  name  to  avoid  ambiguities  as  to  the 
record  and  file  in  which  it  occurs. 

c)  R:  criterion.  This  parameter  gives  the  name  of  a 
criterion  which  is  used  to  select  the  source 
record  which  contains  the  source  fie-td.  The  name 
must  appear  as  the  first  parameter  in  a  CRITERION 
statement . 

d)  f/G:  criterion.  This  parameter  gives  the  7iame  of 
a  criterion  which  is  used  to  select  a  particular 
occurrence  of  a  repeating  field  or  group  to  use  as 
a  source  field  or  group.  The  name  must  appear  as 
the  first  parameter  in  a  CRITERION  statement. 

ucage  Tiie  following  conventions  must  be  observed  in  using  the 

conven¬ 
tions  ASSOCIATE  statement  wherever  the  source  field  Is  a  repeating 

field. 

1.  If  the  target  field  (or  parameter  (ii)a)  is  not  a 
repeating  field,  then  giving  the  source  field  name  unmodified 
by  an  index  results  in  one  target  record  being  formed  for  each 
repetition  of  the  source  field. 

2.  If  the  target  field  is  not  a  repeating  field,  then 
giving  the  source  field  name  modified  by  an  index  results  in 
only  one  target  record  being  formed  and  the  remaining  repetitions 


of  the  source  field  not  being  used. 
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3.  If  the  target  field  is  to  repeat  an  unlimited 
number  of  times,  then  giving  the  source  field  name  unmodified 
by  an  index  results  in  the  target  field  repeating  the  same 
number  of  times  as  the  source  field. 

4.  If  the  target  field  is  to  repeat  a  fixed  or  bounded 
number  of  times  which  is  less  than  the  number  of  times  the 
source  field  repeats,  then  giving  the  source  field  name  un¬ 
modified  results  in  target  records  being  formed  until  all 
repetitions  of  the  source  field  are  used. 

5.  If  the  target  field  is  to  repeat  a  fixed  or  bounded 
number  of  times  which  is  less  than  the  number  of  times  the 
source  field  repeats,  then  modifying  the  target  name  and  the 
source  name  by  an  index  and  creating  a  separate  entry  for  each 
repetition  of  the  target  field  results  in  only  one  target  record 
being  formed  and  the  remaining  repetitions  of  the  source  field 
not  being  used. 

Example  Consider  a  source  record  of  typ-*  'PUBLICATION'  in  file  'FI' 

and  a  target  record  of  type  'PERSON'  in  file  'F2',  satisfying 
the  following  GDDL  descriptions; 

RECORD  f  'PUBLICATION',  'RJB-GRP'  ) 

GROUP  (  ' PUB- GRP1 ,  SPEC; 

(  'NAME',  M,  1,  F  ), 

(  ’AGE’,  M,  1,  F  ), 

(  'SFX ' ,  M,  1,  F  >, 

(  'BOCK- GRP',  0,  NGLIM,  V  )  ) 


GROUP  (  'BOOK- GRP',  SPEC; 


(  'UTLE' ,  M,  1,  F  ), 

(  'PAGES',  M,  1,  F  ), 

(  'DATE' ,  M,  1,  F  )  ) 

FIELD  (  'NAME',  ASCII,  C,  15,  F,  C  ) 

FIELD  (  'AGE',  ASCII,  C,  2,  F,  C  ) 

FIELD  (  'SEX',  ASCII,  C,  1,  F,  C  ) 

FIELD  (  'TITLE',  ASCII,  C,  20,  F,  C  ) 

FIELD  (  'PAGES',  ASCII,  C,  k,  F,  C  ) 

FIELD  (  'DAUB',  ASCII,  C,  -4,  F,  C  ) 

RECORD  (  'PERSON';,  'FERS-GRP'  ) 

GROUP  (  'PERS-GRP1,  SPEC; 

(  'NAME',  M,  1,  F  ), 

(  'AGE',  M,  1,  F  ), 

(  'SEX',  M,  1,  F  )  ) 

FIELD  (  'NAME',  EBCDIC,  C,  15,  F,  C  ) 

FIELD  (  'AGE',  EBCDIC,  C,  2,  F,  C  ) 

FIELD  (  'SEX',  EBCDIC,  C,  1,  F,  C  ) 

To  specify  the'  conversion  of  records  of  type  ’RIBLJ CATION'  to 
records  of  type  'PERSON',  the  following  association  list 
(called  'ASS0C12')  is  provided. 
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(  (  'AGE'  CF  'PERSON'  OF  'F2', 

'ACE'  CF  SOURCE  (  'NAME'  CF  'PERSON'  CF  'P2 '  )  ), 

(  'SEX'  CF  'PERSON'  CF  'F2', 

'SEX'  CF  SOURCE  (  'NAME'  CF  'PERSON'  CF  'F2'  )  )  ) 
i  Ike  use  of  the  system  name,  SOURCE,  is  explained  in  Section 

i 


Using  this  association  list,  a  record; 


JONES 

32 

M 

SCIENCE  I 

384 

1958 

SCIENCE  II 

501 

1963 


ie  converted  as  follows; 


JONES 

32 

M 

SCIENCE 


JONES 

32 

M 


All  source  records  of  file  FI  are  converted  in  this  way. 
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4.1.2 


format 


param¬ 

eters 


usage 
of 
the 
stat^  - 
ment 


usage 
of 
tt  e 
param¬ 
eters 


SOURCE  Statements 

SOURCE  (  target  name  [  ;  criterion  name  ]  ) 

(i)  target  name  is  a  reference  name. 

(ii)  criterion  name  is  an  unindexed,  user-defined  name. 

Oho  .'iOUHCK  statement  is  used  as  part  of  a  reference 
name  to  refer  to  a  particular  record,  group  or  field  which 
was  the  source  of  a  value  for  a  target  field.  If  a  refer¬ 
ence  to  a  particular  source  group  or  field  that  contains  the 
value  for  a  certain  target  field  is  required,  then  the  form: 

source  reference  naae  -  SOURCE  statement 
is  used.  If,  however,  a  reference  to  a  source  group  or  field 
that  appears  in  a  record  which  contains  the  value  for  a  cer¬ 
tain  target  field  is  required,  then  the  usual  form: 

source  reference  name  OF  SOURCE  statement 
ic  used.  This  distinction  is  important-  when  the  group  or 
field  is  repeating  in  the  source  record. 

(i)  target  name.  This  parameter  gives  the  type  of  the 
target  field  for  which  a  value  was  provided. 

(ii)  criterion  name.  This  parameter  is  used  when  the  tar¬ 
get  field  repeats  and  it  is  not  possible  to  use  an 
index  to  Identify  the  appropriate  repetition.  The 
parameter  names  a  criterion  which  ide  .tifics  the 
appropriate  field.  The  name  must  appear  as  the  first 
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Consider  a  source  record,  of  the  type  'PUBLICATION1  described 

,  / 

in  the  Example  of  Section  4.1.1  and  a  source  record  of  the  type 
'BOOK*  in  file  'F3'  satisfying  the  following  GDDL  description: 
RECrRO  '  ’BOOK',  'BOOK-GRP'  ) 

GROUP  (  'BOOK-GRP' ,  SPEC; 

(  'TITLE',  M,  1,  F  ), 

(  'AUfflOR1 ,  M,  NOLIM,  V  ), 

(  'PAGES',  M,  1,  F  ), 

(  'DATE',  M,  1,  F  )  ) 

FIELD  (  'TITLE',  ASCII,  0  PO,  F,  C  ) 

FIELD  (  'AUB10K',  ASCI.,  C,  lb,  F,  C  ) 

FIELD  (  'PAGES',  ASCII,  C,  4,  F,  C  ) 

FIELD  (  'DABS’ ,  ASCII,  C,  4,  F,  C  ) 

Tb  specify  the  conversion  of  records  of  type  'PUBLICATION' 
to  records  of  type  'BOOK',  the  following  association  list 
(called  'ASS0C13')  is  specif ied» 

ASSOCIATE  (  'ASS0C13', 

(  'TITLE'  OF  'BOOK'  OF  'F3', 

'TITLE'  CF  'PUBLICATION'  OF  'Fl'  }, 

(  'AUTHOR '  OF  'BOOK'  OF  'F3', 

'NAME'  QF  'RJBLICATION'  OF  'Fl';  'CHITLj'  ), 

(  ’HA'®:'  <F  'BOOK'  (F  'Fj', 

'EAIE'  OF  'BOOK-GRP'  CF  SOURCE  (  'TITLE'  OF  'BOOK'  OF 
'F3'  )  ), 


are  converted  as  follows: 


4.2  CONVERT  Statements 


format  CONVERT  (  SOURCE  FILES:  file  name,  . ..,  file  name; 

3ARGET  FILES:  file  name,  ...,  file  name; 
association  name  ) 


param-  (i)  and  (ii)  file  name  is  an  unindexed  tsrr-defined  name, 

eters 

(ill)  association  name  is  an  unindexed  user-defined  ame. 


usage  Die  CONVERT  statement  in  used  to  specify  that  a  sut  of 

of 

the  source  files  is  to  be  converted  into  a  set  of  target  files  in 
state¬ 
ment  terms  of  the  following  parameters: 

usage  (i)  SOURCE  FILES:  file  name,  ...,  file  name, 

of 

the  Uiese  parameters  give  the  names  of  the  source  files 

param¬ 
eters  to  be  converted.  Each  name  must  appear  as  the  first 

parameter  in  a  FILE  statement. 

(ii)  TARGET  FILES:  file  name,  ...,  file  name. 

These  parameters  give  the  names  of  the  target  files  to 
be  created.  Each  name  must  appear  as  the  first  param¬ 
eter  in  a  FILE  statement. 

(iii)  association  list.  Ibis  parameter  gives  the  name  of 
the  association  list  which  specifies  for  each  field 
in  the  target  files,  the  field  in  the  source  files 
which  is  to  provide  its  value. 
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Example 


Consider  a  file  'FI'  consisting  only  of  records  of  type 
' PUBLICATI ON 1  and  a  file  'F2'  consisting  only  of  records  of 
type  'PUBLICATION'  and  a  file  'F2'  consistirg  only  of  records 
of  type  'PERSON'  as  described  in  the  example  of  Section  4.1.1. 
Hie  following  statement  specifies  the  conversion  of  file 
'FI'  into  the  file  'F2': 

CONVERT  (  SOURCE  FILES:  'FI'; 

TARGET  FILES:  'F2'; 

'ASS0C12 '  ) 


Appendix  B 

EXAMPLES  QF  GEDDL  DESCRIPTIONS 
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The  examples  presented  In  this  Appendix  demonstrate  the  descriptive 
power  cf  GDDL. 

Example  1  illustrates  the  interfacing  of  files  with  new  programs 
and  different  programming  languages. 

Example  2  illustrates  the  interfacing  of  files  with  different 
operating  systems. 

Hie  data  structures  selected  for  use  in  these  examples  are  actual 
structures  whicn  have  been  implemented  and  are  currently  in  use. 
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I 

[ 


Example  1. 


Use  of  GDDL  to  Cod vert  Files  for  Interfacing 

vith  Nev  Programs  and  Different  Programming  Languages 


This  example  demonstrates  now  GDDL  can  be  used  to  describe  the 
eonverstion  of  a  file  of  records  created  by  a  program  written  in  an 
Assembly  Language  to  a  form  in  which  the  data  contained  in  the  file 
can  be  used  by  a  COBOL  program.  The  records  to  be  converted  are 
records  produced,  by  the  Extended  Data  Management  Facility  (EDMF)  of 
the  University  of  Pennsylvania.  This  conversion  requires  the  descrip¬ 
tion  of  a  set  of  date  in  a  complex  structure  at  the  record  level. 

To  perform  this  conversion,  all  that  is  required  is; 

1)  a  GDDL  description  of  the  file  of  EDMF  records; 

2)  a  GDDL  description  of  a  file  which  can  be  processed  by  the 
COBOL  program; 

3)  a  GDDL  description  of  the  relationships  between  values  in  the 
COBOL  records  and  those  values  in  the  EDMF  records;  and 

4)  the  file  of  EDMF  records. 

The  conversion  process  for  this  example  1b  illustrated  in  Figure 

1-1. 

The  descriptions  of  the  file  of  EDMF  records,  the  COBOL  file  and 
the  relationship  between  them  are  discussed  separately. 


i 
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EDMF  Records  COBOL  Records 


Figure  1-1  EDMF  to  COBOL  Record  Conversion 

1.1  The  File  of  EDMF  Records 

The  EDMF  provides  its  users  with  a  very  comprehensive  record 
organization  feature.  Data  in  the  EDMF  consists  of  two  parts:  the 
first  part  is  called  an  attribute  and  the  second  part  is  called  a  value . 
A.n  EDMF  record  is  a  collection  of  such  attribute- value  pairs.  These 
pairs  can  be  organized  into  a  hierarchic  structure  in  which  attributes 
may  repeat,  and  occur  optionally,  and  in  which  values  may  have  fixed 
or  variable  Gizeo  and  may  be  interpreted  as  numeric  or  alphanumeric 
sir i ngs . 
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In  this  example  each  of  the  user's  EDMF  records  is  to  contain 
data  on  a  book.  Diis  data  will  include: 

i)  a  code  number  for  each  book.  Die  code  number  has  the 
attribute  CODE  NUMBER.  Biis  attribute  occurs  exactly  once  in  each 
record.  The  values  have  a  fixed  size  of  8  byte6  and  are  stored  as  an 
alphanumeric  string. 

li)  the  authors'  names.  An  author  has  the  attribute  AUTHOR. 
This  attribute  must  occur  one  or  more  times  in  eacii  record.  The  values 
for  this  attribute  have  variable  sizes  and  are  stored  as  alphanumeric 
strings. 

An  example  of  a  collection  of  attribute-value  pairs  satisfying 
this  description  is  given  in  Figure  1-2. 


CODE  NUMBER,  BINER540 
AUOHOR,  BIVENS,  R.L. 

AUTHOR,  METROPOLIS,  N. 

Figure  1-2  Example  of  a  Collection  of  Attribute-Value 
Pairs  Forming  an  EEMF  Record 

These  attribute-value  pairs  are  stored  in  the  organization 


illustrated  in  Figure  1-3. 
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storage 

Required 

Data  Ltox'ed  in  EDMF  Record 

3  BYTES 

5  BYTES 

1  BYTE 

SIZE  OF  RECORD 

REFERENCE  NUMBER,  UNPACKED 
CONTROL  INFORMATION 

_  l 

3  BYTES 

1  BYTE 

1  BYTE 

2  BYTES 
VARIABLE 

3  BYTES 
VARIABLE 

LENGTH  OF  ATTR. -VALUE  ENTRY 
CONTROL  INFORMATION 

NUMBER  OF  DIRECTORY  LISTS 

LENGTH  OF  ATTRIBUTE 

ATTRIBUTE 

LENGTH  OF  VALUE 

VALUE 

3  BYTES 

1  BYTE 

1  BYTE 

2  BYTES 
VARIABLE 

3  BYTES 
VARIABLE 

LENGTH  OF  ATTR. -VALUE  ENTRY 
CONTROL  INFORMATION 

NUMBER  OF  DIRECTORY  LISTS 

LENGTH  OF  ATTRIBUTE 

ATTRIBUTE 

LENGTH  OF  VALUE 

VALUE 

• 

• 

• 

• 

• 

CORE  FORMAT 
READER 


ATTRIBUTE 

VALUE 

ENTRY 


Figure  1-3  EDMF  Record. 

Hie  length  specified  in  the  3  byte  "Size  of  Record"  entry  includes 
the  9  byte  Header  size. 

It  is  assumed  that  the  user's  EDMF  records,  ordered  sequentially 
by  CODE  NUMBER,  will  be  input  on  cards.  Hte  conversion  process  must 
extract  the  values  from  these  records  that  are  to  be  organized  for  use 


with  the  COBOL  program 
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Hie  GDDL  description  of  the  file  of  EDMF  records  is  given  L  ow. 
Following  the  GDDL  description  are  explanations  of  the  GDDL  statements 
(as  they  are  needed) . 

GDDL  Description  of  the  file  of  EDMF  records: 

1.  FILE  (  'EDMF  FILE';  'EDMF-LK';  'EDMF-BLK' j  'EDMF  CARDS'  ) 

2.  LINK  (  'EDMF-LK' ;  'BOOK  EDMF' ,  'BOCK  EDMF'; 

'CRIT-EDMF' ,  SEQUEN;  1,  FIXED  ) 

3.  CRITERION  (  'CRIT-EDMF',  (  'CRIT-El'  )  AND  (  ' CRIT-E2 *  )  ) 

4.  CRITERION  (  'CRIT-El',  (  'CODE  NUMBER'  OF  OCC  (  'BOCK  EDMF', 

H  ))  LT  (  'CODE  NUMBER'  OF  OCC  (  'BOOK  EDMF',  T  ))  ) 
5-  CRITERION  (  'CRIT-E2',  ALLOCC  (  XI;  NOT  ( 

(  {  'CODE  NUMBER'  QF  OCC  (  'BOCK  EDMF',  XI  ))  LT 

(  'CODE  NUMBER'  QF  OCC  (  'BOCK  ETMF',  T  ))  AND 

(  (  'CODE  NUMBER'  QF  OCC  (  'BOCK  EDMF',  H  ))  LT 

(  'CODE  NUMBER'  CIF  OCC  (  'BOOK  EDMF',  XI  ))  )  )  )) 

6.  RECORD  (  'BOCK  EDMF',  'BOCK  GROUP'  ) 

7.  GROUP  (  'BOCK  GROUP',  SPEC; 

(  'HEADER',  M,  1,  FIXED  ), 

(  'DATA1' ,  M,  1,  FIXED  ), 

(  'mTA2',  M,  NQLIM,  V  )  ) 

0.  GROUP  (  'HEADER',  SPEC; 

(  'RCD  SIZE',  M,  1,  FIXED  ), 

(  'REF  NO',  M,  1,  FIXED  ), 

(  'CONTROL  INFO',  M,  1,  FIXED  )  ) 


-  275  - 


9.  FIELD  (  'RCD  SIZE',  EBCDIC,  C,  3,  FIXED,  B  ) 

10.  FIELD  (  'REF  NO',  EBCDIC,  C,  5,  FIXED,  C  ) 

11.  FIELD  (  'CONTROL  INFO'  EBCDIC,  C,  1,  FIXED,  C  ) 

12.  GROUP  (  'DATAl',  SPEC; 

(  'A-V  HEADER',  M,  1,  FIXED  ), 

(  'CODE  ENTRY',  M,  1,  FIXED  )  ) 

13-  GROUP  (  'A-V  HEADER',  SPEC; 

(  'A-V  LENGTH',  M,  1,  FIXED  ), 

(  ' CONTROL  INFO’,  M,  1,  FIXED  ), 

(  'DIREC  NO',  M,  1,  FIXED  )  ) 

14.  FIELD  (  'A-V  LENGTH; ,  EBCDIC,  C,  3,  FIXED,  B  ) 

15.  FIELD  (  'DIREC  NO',  EBCDIC,  C,  1,  FIXED,  C  ) 

16.  GROUP  (  'CODE  ENTRY',  SPEC; 

(  'ATT  LENGTH',  M,  1,  FIXED  ), 

(  'CODE  NUMBER  ATTRIB',  M,  1,  FIXED  ), 

(  'VAL  LENGTH',  M,  1,  FIXED  ), 

(  'CODE  NUMBER',  M,  1,  FIXED  )  ) 

17.  FIELD  (  'ATT  LENGTH',  EBCDIC,  C,  c ,  FIXED,  B  ) 

16.  FIELD  (  'CODE  NUMBER  ATIRIB',  EBCDIC,  C,  11,  FIXED,  C) 

19.  FIELD  (  'VAL  LENGTH ' ,  EBCDIC,  C,  3,  FIXED,  C  ) 

20.  FIELD  (  'CODE  NUMBER',  EBCDIC,  C,  8,  FIXED,  C  ) 
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21.  GROUP  (  'DATA2',  SPEC; 

(  'A-V  HEADER',  M,  1,  FIXED  ), 

(  :AUTH  ENTRY'  ,M,  1,  FIXED  )  ) 

22.  GROUP  (  'AUTH  ENTRY’,  SPEC; 

(  ’ATT  LENGTH’,  M,  1,  FIXED  ), 

(  'AUIH  ATTRIB*,  M,  1,  FIXED  ), 
(  'VAL  LENGTH',  M,  1,  FIXED  ), 

(  'AUTHOR',  M,  1,  FIXED  )  ) 


23-  FIELD  (  'AUTH  ATTRIB',  EBCDIC,  C,  6,  FIXED,  C  ) 

2k.  FIELD  (  'AUTHOR',  EBCDIC,  C,  'VAL  LENGTH'  OF  'AUTH  ENTRY', 

V,  C  ) 


25.  BLOCK  (  'EDMF-BLK' j  (  'CARD',  M,  NQLIM,  V  ), 

HDR:  CONSTANT  (  START,  EBCDIC  ); 

TLR:  CONSTANT  (  END  CIF  DATA,  EBCDIC  )  ) 

26.  BBLOCK  (  'CARD';  80,  FIXED;  NOLIM,  1,  V;  SPLIT:  'BOOK 

EDMF';  START:  'BOCK  EEMF'  ) 

27.  CARD  (  'EDMF  CARDS';  DECK:  'EDMF-BLK';  CARD:  'CARD'  ) 


Explanations: 

Statement  24.  In  this  FIELD  statement  the  length  parameter  is 
given  indirectly.  For  each  occurrence  of  the  field  'AUTHOR'  the 
length  of  the  value  is  the  value  of  the  field  'VAL  LENGTH*  in 
the  group  'AUTH  ENTRY'  in  which  the  value  of  'AUTHOR'  occurs. 
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1.2  The  COBOL  Record  Organization 

l''or  a  COBOL  program  to  use  the  data  described  in  pa rt  1  several 
modifications  must  be  made: 

(i)  all  variable  length  values  must  be  converted  to  fixed 
length. 

(ii)  all  groups  and  f.'elds  which  may  be  repeated  an  unlimited 
number  of  times  must  be  concerted  to  groups  which  repeat 
no  more  than  a  fixed  maximum  number  of  times,  where  the 
number  of  times  the  group  or  field  repeats  is  stored  as 
a  variable  in  the  record. 

(iii)  all  unnecessary  fields  may  be  eliminated. 

Thus,  each  COBOL  record  must  include  the  following  data; 

(i)  a  code  number.  The  code  number  appears  exactly  once  in 

each  record.  It  has  a  fixed  size  of  8  bytes  and  is  stored 
as  an  alphanumeric  string. 

(ii)  the  authors'  names.  There  may  be  no  more  than  three 

authors'  names  in  each  record.  Each  author's  name  lias 
a  fixed  size  of  20  bytes  and  is  stored  as  an  alphabetic- 
string. 

(iii)  a  counter  to  contain  the  number  of  authors'  names. 

The  organization  of  these  values  is  illustx-ated  in  Figure  l-J; . 
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DATA*  2- COUNTER 

CODE-NUMBER 

AUTH 

AUTH 

Figure  l-i|  COBOL  Record 

Hie  COBOL  description  of  such  a  record  follows; 

01  BOOK- RECORD. 

02  DATA-2-COUNOER  PICTURE  IS  S 9999  USACE  IS  COMPUTATIONAL. 

02  CODE-NUMBER  PICTURE  IS  X(8) . 

02  DATA-2  OCCURS  3  TIMES  DEPENDING  ON  Mm- 2- COUNTER, 

03  AUTH®  PICTURE  IS  X(20). 

The  conversion  process  will  use  the  GDDL  description  of  the  EDMF 
record  to  extract  the  values  giving  the  code  and.  authors  from  each  EDMF 
record.  It  will  then  use  the  GDDL  description  of  the  CODGL  record  to 
organize  these  values  into  a  C®QL  record.  These  records  are  to  be 
ordered  sequentially  by  CODE- NUMBER,  and  output  on  cards.  The  first 
card  is  to  contain  tne  string;  START  and  the  final  card  is  to  contain 


the  string;  END  OF  DATA. 

The  GDDL  description  of  the  C030L  file  is  given  below: 
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GDDL  Description  of  the  COBOL  file: 

1.  FILE  (  'COBOL  PILE';  'COBGL'LK' ;  ' COBOL- BLK' ; 

'COBOL- CARDS'  ) 

2.  LINK  (  'COBOL-LK';  'BOCK  COBOL',  'BOOK  COBOL' ; 

•ORm- COBOL' ,  SEJQUEN;  1,  FIXED  ) 

3.  CRITERION  (  'CRIT-COBOL',  (  'CRIT-C1'  )  AND 

(  'CRIT-C2'  )  ) 

4.  CRITERION  (  'CRIT-C1',  (  'CODE-NUMBER'  OF  OCC  (  'BOCK 

COBOL’,  H  ))  LT  (  'CODE-NUMBER'  OF  OCC  (  'BOCK 
COBOL',  T  ))  ) 

5-  CRITERION  (  'CRITVC2',  ALLOCC  (  XI;  NOT  ( 

((  'CODE-NUMBER'  OF  OCC  (  'BOCK  COBOL',  XI  ))  LT 

(  'CODE- NUMBER’  OF  OCC  (  'BOCK  COBOL',  T  ))  AND 

((  'CODE- NUMBER'  QF  OCC  (  'BOOK  COBOL',  H  ) )  LT 

(  'CODE-NUMBER'  OF  OCC  (  'BOCK  COBOL’,  XI  )))))) 

6.  RECORD  (  'BOCK  COBOL',  'BOCK  GROUP'  ) 

7-  GROUP  (  'BOCK  GROUP',  SPEC; 

(  'DATA-2-C0UNTER',  M,  1,  FIXED  ), 

(  'CODE- NUMBER.',  M,  1,  FIXED  ), 

(  'DATA-2',  M,  3,  VARIABLE  )  ) 

8.  FIELD  (  'UATA-2-C0UNTKR' ,  EBCDIC,  C,  FIXED,  B  ) 

9-  FIELD  (  'CODE- NUMBER',  EBCDIC,  C,  8,  FIXED,  C  ) 

10.  CROUP  (  'DATA-L*',  SPEC;  (  'AUTJf ' ,  M,  1,  FIXED  )  ) 

FIELD  (  'AUUr ,  EBCDIC,  C,  20,  FIXED,  C  ) 


11. 
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12.  BL0<Jk  (  'CCB0L-B1K';  (  'CCARD',  NOLIM,  V  ), 

HDR:  CONSENT  (  START,  EBCIEC  ); 

TLR:  CONSTANT  (  END  CF  LATA,  EBCDIC  )  ) 

13*  BBLCjcK  (  'CCARD' ;  80,  FIXED;  NOljtM,  1,  V; 

SPLIT:  'BOCK  COBOL';  STARlj  'BOCK  COBOL'  ) 


14.  CARD  (  'COBOL- CARDS';  DECK:  ' COBOL- BLK ' ;  CARD:  'CARDS'  ) 


1.3  The  Relationship  Description 

Gto  convert  records  from  the  EEMF  record  format  to  the  COBOL  record 
format,  the  user  must  also  specify  where  each  value  for  the  COBOL  records 
Is  to  be  found  in  the  EDMF  records.  Diis  is  specified  in  the  single 
GDDL  statement  given  below: 

ASSOCIATE  (  'ASSOC  LIST' ; 

(  'CODE- NUMBER'  OF  'BOCK  COBOL'  OF  'COBOL  FILE', 

'CODE  NUMBER'  OF  'BOCK  EDMF'  CF  'EDMF  FILE'  ), 

(  'Aura'  OF  'DAm-2'(l)  OF  'BOCK  COBOL'  CF  'COBOL  FILE', 
'AUOHOR'  OF  'DATA2 ' (l)  OF  SOURCE  (  'CODE-NUMBER'  )), 

{  '  AUTB '  OF  'DATA-2' (2)  OF  'BOCK  COBOL'  OF  'COBOL  FILE', 
'AU9H0R'  OF  'DATA2 ' (2)  OF  SOURCE  (  'CODE  NUMBER'  )), 

{  'AULT  CF  'DATA-2 '(3)  CF  'BOOK  COBOL*  OF  'COBOL  FILE’, 
'AUTHOR'  CF  ’DATA2'(3)  OF  SOURCE  (  'CODE  NUMBER '  )), 

■  * 

(  'DATA-2- COUNTER'  CF  'BOCK  COBOL’  CF  'COBOL  FILE', 

COUNT  {  'DATA-2'  CF  'BOOK  COBOL'  ))  ) 

Given  the  values  of  an  EDMF  record  organisation  as  described  in 
section  1.1,  this  GDDL  statement  specifies  that; 
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(i)  the  value  of  the  field.  'CODE-NUMBER'  in  the  COBOL  record,  comes 
from  the  field  'CODE  NUMBER'  in  the  EDMF  record. 

(ii)  the  values  of  'AUffl'  in  the  repeating  group  'DAIA-2 '  of  the 

COBOL  record  come  from  the  values  of  'AUTHOR'  in  the  repeating 
group  'DATA2'  in  the  EDMF  record.  If  there  are  more  than 
three  values  of  'AUTHOR '  in  the  EDMF  record,  these  are  not 
used. 

(iii)  the  value  of  'DATA-2-CCUNTER'  is  the  number  of  values  for 
'AUTK'  in  the  COBOL  record  being  created. 

These  relationships  are  illustrated  in  Figure  1-5. 

1.4  The  Complete  Description  of  the  Conversion 

The  GDDL  statements  given  in  Parts  1.1,  1.2,  and  1.3  of  this 
example ,  together  with  the  GDDL  statement  given  below,  complete  the 
description  requirements  for  the  conversion  of  the  file  of  EDMF  x’ecords 
to  a  form  that  can  be  used  by  a  COBOL  program. 

CONVERT  (  SOURCE  FILES:  'EDMF  FILE';  TARGET  FILES:  'COBOL  FILE'; 

'ASSOC  LIST'  ) 


EOMF  RECORD 
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Figure  1-5  Relationships  between  fields  in 
the  COBOL  record  and  fields  in  tne  KDMF 
record 


Example  2. 
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Use  of  GDDL  to  Convert  Files  for  Interfacing 
with  Different  Operating  Systems 

This  example  demonstrates  how  ODDL  can  be  used  to  describe  the 
conversion  of  a  file  created  under  one  operating  system  to  a  form  in 
which  it  can  be  accessed  under  a  different  operating  system.  The  data 
to  be  converted  is  in  the  form  of  fixed  length  records  stored  on  tape 
by  the  Sequential  Access  Method  (SAM)  of  the  RCA  SPECTRA  70/% 6  Time 
Sharing  Operating  System  (ISOS).  This  file  is  to  be  converted  into  a 
file  on  disk  that  can  be  accessed  by  the  Indexed  Sequential.  Access 
Method  (ISAM)  of  the  RCA  SPECTRA  70/k6  Tape  Dick  Operating  Sytem  (TDOS) . 
This  conversion  requires  the  description  of  sets  of  data  in  complex 
structures  at  the  file  and  storage  levels. 

To  perform  this  conversion,  the  following  descriptions  and  data 
are  required; 

1)  a  GDDL  description  of  the  structure  of  the  SAM  file; 

2)  a  GDDL  description  of  the  structure  of  the  ISAM  file; 

3)  a  GDDL  description  of  the  relationships  between  data  items 
in  the  SAM  file  and  those  data  items  in  the  ISAM  file;  and 

b)  the  SAM  file. 

The  conversion  process  if  illustrated  in  Figure  2-1. 

'Die  descriptions  of  the  SAM  file,  the  ISAM  file  and  the  relation- 
ship  descriptions  are  discussed  separately. 
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Figure  2-1  3506  SAM  File  to  TD06  ISAM 

File  Conversion 

2.1  Ihe  ISOS  SAM  File  Organization 

The  ISOS  SAM  File  of  this  example  is  a  file  produced  by  a  COBOL 
program.  The  file  contains  information  on  bills  and  is  assigned  the 
name  BILL-FILE.  Each  record,  called  BILL- RECORD,  is  of  fixed  length 
with  the  organization  shown  in  Figure  2-2. 


FILE  SECTION. 
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,  DEPT 
02  AMT  PICT 


02  COST  PICTURE  13  X(6) . 
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’STANDARD  FILE  tlr  label’ 


Storage  Image  of  BILL-FILE 


Each  of  the  labels  contains  80  bytes.  Each  physical  block,  'BxLL-BLk', 

on  the  tape  has  the  organization  illustrate^  in  Figure  :>!♦. 

v 
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FILE  SECTION. 


FD  BILL-FILE 


.RECORDING  MODE  IS  F 
LABEL  RECORD  IS  STANDARD 


DATA  RECORD  IS  BILL-RECORD 
01  BILL-RECORD. 

02  VENDOR-NAME  PICTURE  I*3  X(30). 
02  BILL-NO  PICTURE  IS  X(10). 

02  PART-NO  PICTURE  IS  X(iO). 

02  SERIAL-NO  PICTURE  IS  X(10). 
02  AMT  PICTURE  IS  X  (6). 

02  COST  PICTURE  IS  X  (6). 

02  DATE  PICTURE  IS  X(6). 

02  DEPT-NO  PICTURE  IS  X(2) . 


The  file  was  produced  using  TSOS  SAM,  which  produces  a  tape 
with  the  organization  illustrated  in  Figure  2-3. 


j  'TM'  'BILL-BLK' 

'standard  file  hdr  label' 

'STANDARD  VOLUME  LABEL* 


BILL-BLK  I  'TM"TM' 
'STANDARD  FILE  tlr  label' 


Figure  2-3  Storage  Image  of  BILL-FILE 


Each  of  the  labels  contains  80  bytes.  Each  physical  block,  'BILL-BLK'. 
on  the  tape  has  the  organization  illustrated  in  Figure  2-4. 


'key'  'bill  record'  'bill  record' 


#  I 


2064  bytes 


#25 _ _ 


Figure  2-lj  Storage  Structure  of 
'BILL-BLK' 

Each  'BILL-BLK'  contains  25  'BILL- RECORD'  record  occurrences. 


The  GDDL  description  of  the  ISOS  SAM  file  is  given  below.  For 
brevity,  the  description  of  the  records  are  omitted. 

GDDL  Description  of  the  ISOS  SAM  file  BILL-FILE: 

1.  FILE  (  'BILL-FILE';  'BILL-LINK';  'BILL- TAPE';  'TAPE'  ) 

2.  LINK  (  'BILL-LINK';  'BILL  RECORD',  'BILL  RECORD'; 

'BILL-CRIT',  SBOUEN;  1,  FIXED  ) 

3.  CRITERION  (  'BILL-CRIT',  (  'CRIT-B1'  )  AND  (  ' CRIT-B2 '  )) 

k.  CRITERION  (  :CHIT-B1',  (  'SERIAL-NO'  OF  OCC  (  'BxLL- 

UECQRD',  II  ))  LT  (  'SERIAL- NO'  OF  OCC  (  'BILL- 
RECORD'  ,  T  ))  ) 

5.  CRITERION  (  ' CHIT-B2 ' ,  ALLOCC  (  XI;  NOT  ( 


((  'SERIAL- NO'  OF  OCC  (  ' BILL- RECORD' ,  XI  ))  LT 
(  'SERIAL-NO'  OF  OCC  (  ' BILL- RECORD ’ ,  T  ))  AND 
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((  'SEPIAL-NO'  OF  OCC  (  ' BILL- RECORD' ,  H  ))  LT 
(  'SERIAL-NO'  OF  OCC  (  'BILL- RECORD',  XI  )))))) 

6.  The  QLDL  statements  describing  the  record 

BILL- RECORD  appear  here. 

?.  BLOCK  (  'BILL-TAPE';  ( 1 BILL-BLK ' ,  M,  NQLIM,  V  ), 

HDR:  'STANDARD  VOLUME  LABEL'; 

HDR:  'STANDARD  FILE  HDR  LABEL'; 

HDR:  'OM'; 

TLR:  'STANDARD  FILE  ILR  LABEL'; 

TLR:  'IM'; 

TLR:  '2M'  ) 

8.  FIELD  (  'STANDARD  VOLUME  LABEL’,  EBCDIC,  C,  80,  FIXED,  C  ) 

9.  FIELD  (  'STANDARD  FILE  HDR  LABEL' ,  EBCDIC,  C,  80,  FIXED,  C  ) 

10.  FIELD  (  'TM',  B,  B,  8,  FIXED,  C  ) 

11.  FIELD  (  'STANDARD  FILE  ILR  LABEL',  EBCDIC,  C,  80,  FIXED,  C  ) 

12.  BBLOCK  (  'BILL-BLK';  20 64,  FIXED;  80,  1,  FIXED; 

START:  ' BILL- RECORD ' ;  HDR:  'KEY'  ) 

13-  FIELD  (  'KEY',  EBCDIC,  C,  2,  FIXED,  C  ) 

14.  TAPE  (  'TAPE';  TAPE:  'BILL-TAPE';  TAPE  BLOCK:  'BILL-BLK'  ) 
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2.2  The  2D06  ISAM  File  Organization 

For  the  file  described  in  part  2.1  to  be  accessible  to  H)0S  ISAM 
two  modifications  must  be  made: 

(i)  the  records  must  be  stored  on  disk  under  1D0S  ISAM  conven¬ 
tions;  and 

(ii)  hierarchic  indices  must  be  created. 

Hie  emphasis  of  this  part  of  the  example  will  be  on  describing  the 
overall  structure  of  these  indices  rather  than  on  the  details  of  their 
implementation.  Therefore,  we  will  not  discuss  the  addressing  scheme 
or  the  implementation  of  the  pointers  stored  in  the  indices. 

The  new  file  will  be  assigned  the  name  BILL-FL  to  conform  with 
TDOS  naming  conventions.  3ILL-FL  will  be  organized  as  follows: 

(i)  the  records  will  be  ordered  sequentially  according  to  the 
values  of  the  field  SERIAL-NO. 

(ii)  the  records  will  be  blocked.  Each  disk  block  will  be  the 
size  of  a  disk  track.  To  simplify  the  example,  it  will 
be  assumed  that  no  overflow  tracks  are  to  be  reserved. 

19  tracks  per  cylinder  will  be  used  to  store  records. 

(iii)  three  levels  of  indices  will  be  created: 

l)  a  track  index  for  each  data  cylinder.  One  track  per 
cylinder  of  records  will  be  reserved  for  an  index. 

Each  record  in  this  index  will  point  to  a  track  block. 
These  records  are  not  blocked. 

!’)  an  index  cylinder.  To  simplify  this  example ,  it  will 
be  aosumed  that  only  one  index  cylinder  is  needed. 


t 


Each  record  in  this  index  will  point  to  a  track  index 
of  a  data  cylinder.  Records  will  be  blocked.  Each  disk 
block  will  be  the  size  of  a  disk  track. 

3)  a  track  index  for  the  index  cylinder.  One  track  of 
the  index  cylinder  is  reserved  for  an  index.  Each 
entry  in  this  index  will  point  to  a  disk  block. 

Records  are  not  blocked. 

The  organization  of  BILL~FL  is  illustrated  in  Figure  2-r?. 

Index  Cylinder 
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The  RCA  TDOS  ISAM  description  of  tnis  file  follows: 

BILL-FL  DTFIS  BLKSIZE=724l, 

KETLEN=10, 

KEYLOC--51, 

PRIDEVT=D1GK90, 

PRIDEX=0, 

RECFORM=FIXED , 

RECSIZE=80, 

UPINDEX=DISK90 

The  conversion  process  will  use  the  GDDL  description  of  the  TSOS 
SAM  file,  BILL-FILE,  to  extract  the  values  from  the  tape.  It  will 
then  use  the  GDDL  description  given  below  to  organize  these  values  into 
the  TDOS  ISAM  file  format. 

The  GDDL  description  of  the  file  BILL-FL  is  given  below  For  brev¬ 
ity,  the  description  of  the  records  and  device  labels  are  omitted. 

GDDL  Description  of  TDOS  ISAM  File  BILL-FL: 

1.  FILE  (  'BILL-FL';  ' BILLS- SBQ1 ' ,  ' BILLS- SBQ2 ' j  'FILE  BLOCK’; 

’DISK'  ) 

2.  LINK  (  'BILLS-SEQ1' ;  ’BILL-RECORD’,  ’BILL-RECORD'; 

'BILLS-SEQ-CRIT' ,  SEQUEN;  1,  FIXED  ) 

3-  LINK  (  'EILLS-SEQ2';  ' BILL- RECORD' ,  ' BILL- RECORD ' ; 

'BILLS-SEQ-CRIT',  DIREC,  'HX-PTR',  'BINEK'; 

1,  FIXED  ) 

4.  CRITERION  (  'BILLS-SEQ-CRIT',  (  'BSQ1'  )  AND  (  '8SQ2'  )  ) 

5.  CRITERION  (  'BSQ1',  (  'SERIAL- NO'  OF  OCC  (  ' BILL- RECORD ’ , 

H  ))  LT  ( 


'SERIAL-NO'  CF  OCC  (  ' BILL- RECORD ' ,  T  ))  ) 
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6.  CRITERION  (  '3SQ2',  ALLOCC  (  XI;  NOT  ( 

((  'SERIAL-NO'  OF  OCC  (  ' BILL- RECORD ' ,  XI  ))  LT 

(  'SERIAL- NO'  CT‘  OCC  (  'BILL-RECORD',  T  ))  AND 

((  'SERIAL-NO'  OF  OCC  (  'BILL-RECORD' ,  S  ))  LT 

(  'SERIAL- NO'  OF  OCC  (  'BILL-RECORD',  XI  })}  )  )  ) 

7.  The  GDDL  statements  describing  the  record 

BILL-RECORD  appear  here. 

8.  FITE  (  BINEX';  'BX-SEQ1',  'HX-SEQ2';  1 FILE- BLOCK 'DISK*  ) 

9.  LINK  (  'BX-SEQ1;  'BX- RECORD',  'BX-RECORD'; 

'BILLS-££ft-CRIT' ,  SEQIJEN;  1,  FIXED  ) 

10.  LINK  (  'BX-SEQ2;  'BX-RECORD',  'BX-RECORD'; 

'BILLS-SEQ-CRIT',  DIREC,  'BXX-PTR',  'BINEXX'; 

1,  FIXED  ) 

11.  Hie  GDDL  statements  describing  the  record 

BX-RECORD  appear  here.  These  records 
fora  the  index  for  each  data  cylinder. 

12.  FILE  (  'BINDXX ' ;  'HXX-SBQ1',  'BXX-SEQ2;  'FILE-BLOCK';  'DISK'  ) 

13.  LINK  (  'HXX-SBQ1';  'BXX-RECORD' ,  'BXX-RECORD' ; 

'BILLS-SEQ-CRIT' ,  SflQUEN;  1,  FIXED  ) 

14.  LINK  (  'HXX-SEQ2';  'BXX-RECORD' ,  'BXX-KECQUD'  ; 

'BILLS-SEQ-CRIT',  DIREC,  'BXXX-PTR',  'BINDXXX'; 

1,  FIXED  ) 

15.  Ihe  GDDL  statements  der" Gibing  the  record 

BXX-RECCRD  appear  hare.  Biese  records 
fora  the  index  on  the  index  cylinder. 


( 
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16.  KILE  (  'BINBXXX 1 ;  'HXXX-SEQ1';  ' INDEX- CYL':  'DICK'  ) 

17.  LINK  (  'BXXX-SEQ1 ' ;  'BXXX- RECORD' ,  'E/XX- RECORD' ; 

'BILLS-5EQ-CFJT',  SEQUEN;  1,  FIXED  ) 

18.  Has  GDDL  statements  describing  the  record 

HXXX-KECORD  appear  here.  These  records 
from  the  index  cylinder. 


19- 

BLOCK  ( 

'  FILE- BLOCK ' ; 

( 

' INDEX-CHi' ,  0,  1,  V  ), 

( 

'DATA- CYL',  M,  NOLIM,  V  )  ) 

20. 

BLOCK  ( 

'INTEX-CYL' ; 

( 

'I-INDEX-TRACK',  M,  1,  FIXED  ), 

( 

'I-DATA- TRACK' ,  M,  19,  VARIABLE  ) 

) 

21. 

BLOCK  ( 

'I- INDEX- TRACK' ;  (  'I- INDEX-BLOCK1 

',  M,  NOLIM,  V  )  ) 

CVJ 

Oi 

BBLOCK  ( 

'  I- INDEX -BLOCK;  NOLIM,  V;  1,  1, 

FIXED; 

START:  'HXXX- RECORD1  ) 

23. 

BLOCK  ( 

•I-DATA- TRACK';  (  'I-DATA-BLOCK' , 

M,  1,  FIXED  )  ) 

2L. 

BBLOCK  ( 

'I-DATA-BLOCK';  7^,000,  FIXED;  NOLIM,  1,  V; 

START:  'HXX- RECORD'  ) 

25. 

BLOCK  ( 

'DATA-CYL'; 

( 

'D- INDEX -TRACK',  M,  1,  FIXED  ), 

( 

'D- DATA- TRACK',  M,  19,  VARIABLE  ) 

) 

26. 

BLOCK  ( 

'D- INDEX- TRACK';  (  'D- INDEX-BLOCK 

',  M,  NOLIM,  V  )  ) 

27. 

BBLOCK  ( 

'D- INDEX-BLOCK’;  NOLIM,  V;  1,  1, 

,  FIXED; 

START:  ’HX- RECORD’  ) 
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28.  BLOCK  (  'D- DATA- TRACK';  (  '  D-DATA-BLOCK ' ,  M,  1,  FIXED  )  ) 

29.  3BL0CK  (  'D-MTA-BLOCK';  74,000,  FIXED;  NQLIM,  1,  V; 

START:  'BILL-RECORD'  ) 

30.  DISK  (  'DISK';  DISK:  'FILE-BLOCK'; 

CYLINDER:  ' INDEX -CYL',  'DATA-CYL'; 

TRACK:  ' I- INDEX- TRACK' ,  ' I-DATA- TRACK ' , 

' D- INDEX-  TRACK ' ,  ' D-DATA- TRACK ' ; 

TRACK  BLOCK:  'I-INDEX-BLOCK' ,  ' I-DA'DA-BLOCK ' . 

'D- INDEX-BLOCK',  'D-DATA-BLOCK'  ) 

31.  The  GDDL  statements  for  describing  the  pointers 

appear  here. 

Explanations; 

Statements  1-7-  These  statements  describe  the  organization  of 
records  of  type  BILL- RECORD  in  BILL-FL. 

Statements  8-11.  These  statements  describe  the  track  index  for 
each  data  cylinder. 

Statements  12  -  15 .  These  statements  describe  the  index  cylinder. 
Statements  16  -  18.  These  statements  describe  the  track  index  for 
the  index  cylinder. 

Statements  19-31*  These  statements  describe  the  storage  struc¬ 
ture  and  its  implementation  for  BILL-FL. 
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2.3  The  Relationship  Description 

lo  convert  date,  from  the  ISOS  SAM  file  organization  to  the  TDOS 
ISAM  file  organization ,  the  user  must  specify  where  each  value  for  the 
ISAM  records  are  to  be  found  in  the  SAM  records.  Ihis  is  specified  in 
the  single  GDDL  statement  given  below; 

ASSOCIATE  (  'BILL-ASSOC'; 

(  ' BILL- RECORD '  OF  'BILL-FILE' ,  'BILL-REC }RD'  OF  'BILL-FL'  )) 

Given  the  values  of  a  SAM  record,  this  GDDL  statement  specifies 
that  the  record  is  to  become  an  ISAM,  record  without  requiring  any 
conversion. 

2.1  The  Complete  Description  of  the  Conversion 

Ihe  GDDL  statements  given  in  Parts  2.1,  2.2,  and  2.3  of  this 
example,  together  with  the  GDDL  statement  given  below,  complete  the 
description  requirements  for  the  conversion  of  the  ISOS  SAM  file, 
'BILL-FILE',  Into  the  TDOS  ISAM  file,  'BILL-FL'. 

CONVERT  (  SOURCE  FILES:  'BILL-FILE';  TARGET  FILES:  'BILL-FL'; 

'BILL-ASSOC'  ) 


Appendix  C 


RELATIONSHIP  OF  GDDL  TO  COBOL 
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In  this  Appendix,  we  show  that  record  level  GDDL  is  complete  with 
respect  to  COBOL  and  more  general  than  COBOL.  We  show  completeness  by- 
demonstrating  that  the  GDDL  can  describe  all  record  level  options  pro¬ 
vided  by  COBOL  and  we  show  generality  by  giving  a  set  of  examples  of 
characteristics  describable  in  GDDL  but  not  provided  by  COBOL.  We 
give  these  demonstrations  in  terms  of  RCA  SPECTRA  70/46  COBOL. 


1.  The  Containment  of  the  COBOL  Record  Description  Features  in  GDDL 

In  this  section  we  demonstrate  that  GDDL  can  completely  describe 
the  set  of  data  structure  options  provided  by  each  COBOL  clause  in  the 
record  description  part  of  the  COBOL  Data  Division.  This  is  done  by 
giving  a  procedure  for  converting  any  such  COBOL  clause  into  its  GDDL 
equivalent.  We  give  below  a  complete  list  of  such  COBOL  clauses.  We 
then  present  the  statement  (or  statement  part)  in  GDDL  which  corresponds 
to  each  COnOL  clause. 

1.1  The  COBOL  Record  Description  Clauses 


1. 

2 

3- 

4. 


level- number 


/filler  7 

\.data-name-lj 


Ldata- name- 
[  REDEFINES  data- name-2  ] 

[  PICTURE  IS  picture  string 
[  BIANK  WHEN  ZERO  ] 

[  OCCURS  integer  TIME  ...  1 


] 
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6.  [  USAGE  IS  < 


DISPLAY 
COMPUTATIONAL 
COMPUmTtONAL-1 
COMPUTATIONAL^ 
COMPUTATIONAL- 3 


n 


INDEX 

7-  [  JUSTIFIED  RIGHT  ] 

8.  [  VALUE  IS  literal  ] 

1.2  The  COBOL  to  GDDL  Translation  Procedure 

Each  COBOL  clause  and  the  procedure  for  translating  it  to  a  GDDL 
statement  are  row  given: 


1.  level- number 


/FILLER 
1  data -name 


-J 


In  COBOL,  the  level-number  is  used  to  describe  a  hierarchic  organi¬ 
zation  for  records.  Every  clause  of  types  2-8  describes  features  of 
a  group  or  field  which  is  to  be  referred  to  by  the  name  data-name-1  or 
FILLER.  FILLER  Is  used  when  the  group  or  field  does  not  have  a  unique 
name  associated  with  it.  Level-number  01  is  used  to  describe  features 
of  the  entire  record. 

The  order  in  which  level-number  clauses  occur  relative  to  other 
level- number  clauses  is  important.  If  such  a  clause  is  followed  by  one 
with  a  higher  level-number,  then  the  first  clause  must  describe  a  group 
containing  the  field  or  group  named  in  the  second  clause.  If  a  level- 
number  clause  is  followed  by  a  clause  with  a  higher  level-number,  then 
the  former  must  describe  a  field. 
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This  clause  is  described  in  GDDL  by  the  following  statements: 

a)  When  the  clause  describes  a  field,  the  following  GDDL  state¬ 
ment  is  used  - 

FI3SLD  (  data-name-1,  ...  ) 

A  unique  name  is  created  when  FILLER  occurs  in  the  clause. 

b)  When  the  clause  describes  a  group,  the  following  GDDL  state¬ 
ment  is  used  - 

GROUP  (  data-name-1,  SPEC; 

(  data-name-11,  ...  ), 

(  data-name-ln,  . . .  )  . . .  ) 

There  is  an  entry  of  the  form  (  data-name-li,  ...  )  for  each  group  or 
field  with  the  name  data-name-li  at  level  02. 

2.  [  REDEFINES  data- name- 2  ] 

This  clause  is  used  in  COBOL  to  specify  that  a  different  group  or 
field  of  the  same  length  may  occur  in  the  record  in  the  place  of  the  one 
being  specified.  This  group  or  field  must  also  be  described  in  a  level- 
number  clause,  and,  therefore,  by  GDDL  statements  of  the  type  described 
for  clause  1  above.  In  addition,  the  following  parameters  must  be  used 
in  the  GRCUP  statement  containing  entries  for  both  data-name-1  and  ilata- 
name-2; 

a)  In  the  list  parameter  for  both  data-name-1  and  data-name-2, 
the  occurrence  parameter  must  be  specified  as  OPTIONAL  (or  0). 
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b)  In  the  OROJP  statement,  a  criterion  is  defined  for  each  group 
or  field  which  has  been  redefined.  Bie  criterion  states  that  the  data 
can  be  present  in  the  record  in  only  one  of  the  forms  specified.  This 
is  specified  in  the  following  GDDL  statements  - 

CRITERION  (  crit-nama,  (  COUNT  (  data-name-1  ))  NE 
(  COUNT  (  data-name-2  ))  ) 

3.  [  PICTURE  IS  picture  string  ] 

where  picture  string  is  a  string  of  type: 

a)  alpha-form, 

b)  an-form, 

c)  numeric-form, 

d)  report-form,  or 

e)  fp-form. 

CBiis  clause  is  used  in  COBOL  to  specify  data  type  and  length  for 
a  field.  Each  picture  string  form  will  be  discussed  separately. 

a)  alpha-form.  A  string  of  n  A's  is  an  alpha  form.  It  represents 
an  EBCDIC  character  string  of  length  n  consisting  of  the  alphabetic 
characters  and  the  space  character.  Ihis  is  specified  in  GDDL  as: 

FIELD  (  data-name-1,  'EALPHA',  C,  n,  F,  ...  ) 
where:  the  3rd,  Uth  and  5th  parameters  specify  the  length  as 
being  n  characters  and  fixed,  and 
the  2nd  parameter  specifies  that  the  character  set  Is 
specified  by  a  CHAR  statement  and  called  'EALPHA'; 
it  is  defined  as  the  set  of  EBCDIC  alphabetic  and 
space  characters. 


b)  an-form  (alpha-numeric).  A  string  of  n  X's  is  an  an- form. 

It  represents  any  EBCDIC  character  string  of  length  n.  This  is  specified 
in  GDDL  as; 

FIELD  (  data-name-1,  EBCDIC,  C,  n,  F,  C  . . .  ) 
where:  the  3rd,  4th  and  5th  parameters  specify  the  length 
as  being  n  characters  and  fixed,  and 
the  2nd  parameter  specifies  that  the  character  set 
is  the  EBCDIC  character  set. 

c)  numeric-form.  A  string  of  9's  containing  the  additional 
characters  V,  P,  and  S  is  a  numeric  form.  It  represents  a  binary  or 
decimal,  signed,  fixed-point  number. 

When  the  number  is  binary  (see  USAGE  IS  clause),  it  has  a  length 
of  2  bytes  when  up  to  4  nines  appear  in  the  numeric  form,  4  bytes  when 
up  to  9  nines  appear,  and  8  bytes  when  up  to  18  bytes  appear.  This  is 
specified  in  GDDL  as: 

f2l 

FIELD  (  data-name-1,  EBCDIC,  C,  4V  ,  F,  N  (  B,  R,  FX  )  ...  ) 

UJ 

When  no  USAGE  clause  is  specified  for  a  numeric-form  field,  it 
contains  a  decimal  number.  The  number  has  a  length  of  n  bytes  when  n-1 
9's  appear  in  the  numeric-form.  Thic  is  specified  in  GDDL  as: 

FIELD  (  data-name-1,  EBCDIC,  C,  n,  F,  N  (  L),  C,  FX  )  ...  ) 
u)  report-form.  The  report  form  is  a  combination  of  the  numeric- 
form  and  fp-form  used  for  outputting  data  to  the  printer.  Hicrefore, 


it  will  not  be  discussed  separately. 
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e)  fp-form  (floating  point) .  A  string  consisting  of  the  charac¬ 
ters  +,  -,  9»  V,  and  E  is  an  fp-form.  It  represents  a  decimal  floating 
point  number  vith  a  two  digit  exponent.  This  is  specified  in  GDDL  as: 
FIELD  (  data- name-1,  EBCDIC,  C,  n,  F,  N  (  D,  C,  FL: 

M-data-name-1,  E-data-name-1  )  ...  ) 
where  the  6th  parameter  specifies  that  the  field  is  a  floating 
point  decimal  number  with  character  sign.  The  mantissa 
and  exponent  are  described  by  FIELD  statements  of  the 
form: 

FIELD  (  M-data-name-1,  EBCDIC,  C,  m,  F,  N  (  D,  C,  FX  ) 
...  )  and 

FIELD  (  E-data-name-1,  EBCDIC,  C,  2,  F,  N  (  D,  C,  FX  ) 
...  )  CONCQIDE  (  CONSTANT  (  E,  EBCDIC  ),  PRX  )  ) 
where  m  is  the  number  of  characters  in  the  mantissa. 

4.  [  BLANK  WHEN  ZERO  ] 

'fliis  clause  specifies  a  value  to  be  placed  in  a  field  during 
computation,  and  does  not  describe  the  record  structure.  Therefore, 
it  is  not  specified  in  the  GDDL  description  of  a  COBOL  record. 

5.  [  OCCURS  [  integer-1  J  integer-2  TIMES 

[  DEPENDING  ON  data-name-3  ] 

c{£SmSo}  ***  IS  dot*-n"“-2  0  3  1  1 

INDEXED  BY  index-name- 1  ] 


I 
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This  clause  specifies  that  an  attribute  repeats  in  a  record. 

We  will  discuss  each  subclauoe  of  this  clause  separately. 

a)  [  integer-1  TO  j  integer-?  TIMES 

When  [  integer-1  TO  ]  does  not  appear,  this  clause  specifies  either 
the  number  of  times  or  the  maximum  number  of  times  a  group  or  field 
type  repeats.  This  is  specified  in  GDDL  as  follows: 

GROUP  (  ... 

(  data-name-1,  {oj  *  inte«er-2  [vj  •••  )» 

...  ) 

where  the  fourth  parameter  is  determined  by  the  appearance 
of  the  DEPENDING  subclause. 

When  [  integer-1  TO  ]  does  appear,  this  subclause  specifies  the 
maximum  number  of  times  the  group  or  field  type  repeats.  Integer-1 
specifies  a  minimum  number  of  times  it  must  repeat.  Ihis  is  specified 
in  GDDL  as  follows: 

The  group  or  field  in  question  is  described  as  two  groups  or  fields  - 
one  of  which  occurs  a  fixed  number  of  times  (  integer-1  -  1  times  ) 
and  the  second  which  occurs  a  maximum  number  of  times  equal  to  integer-?  - 
integer-1  +  1.  The  GDDL  statements  that  specify  this  are: 

GROUP  (  ... 

(  data- name-1 .1,  M,  integer-1  -  1,  F  ...  ), 

(  (lata- name -1.2,  M,  integer-2  -  integer-1  +  1,  V  ...  ), 

) 
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b)  [  DEPENDING  ON  data-name- 3  ] 

This  clause  is  used  when  a  group  or  field  may  repeat  a  variable 
number  of  times.  The  actual  repetition  number  is  stored  as  a  value  in 
the  field  named  data-name-3.  This  is  specified  in  GDDL  as  follows: 

GROUP  (  ... 

•  •  • 

(  data- name- 1,  ,  data-name-3,  V  ...  ), 

...  ) 

C)  C  f^Sraoj  IS  C  -  ]  ...  ] 

This  subclause  is  used  to  specify  that  the  values  of  a  repeating 
group  or  field  are  ordered  in  ascending  or  descending  order  by  the 
fields  named  data-name-2,  data-name-3,  etc. 

When  a  field  repeats,  data-name-2  must  equal  data-mme-1.  Hiis 
is  specified  in  GDDL  as  follows: 

GROUP  (  ... 


...  ) 

When  data-nome-1  refers  to  a  group  which  repeats,  this  is  specified 
in  GDDL  as  follows: 

GROUP  (  ... 

•  *  • 

(  data-nama-1,  ...  ;  0,  crit-name  ...  ), 


•  •  • 


) 
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where  crit-name  refers  to  a  criterion  which  determines  when 
one  group  is  to  be  placed  ^.fter  another  group.  This 
criterion  is  specified  us  follows: 

Assuming  that  keys:  data- name- 2,  . . . ,  data-narae-n 
are  given  in  the  COBOL  description,  the  following  CRITERION 
statement  must  be  specif  Led. 

CRITERION  (  crit-name,  (crit2)  OR  ...  OR  (critn)  ) 
where  criti,  for  2  £  i  <  n  is  specified  as: 

CRITERION  (  criti,  (  (data-name-2  OF  CCC  (data-name-1,  h)) 
EQ  (data-name-2  CF  OCC  (data-name-1,  T))  )  AND  ... 
AND  (  (data- name- i-1  OF  OCC  (data-name-1,  H)) 

(data- name- i-1  OF  OCC  (data-name-1,  T))  )  AND 
(  (data-name-i  OF  OCC  (data-name-1,  H))  LT 
(data-name-i  OF  OCC  (data-name-1,  T  ))  )  AND 
(  ALLCOC  (  XI;  NOT  (  (  (data-name-i  OF  OCC 

(data-name-1,  Xl))  LT  (data-name-i  CF  OCC 
(data-name-1,  T  ))  )  AND  (  (data-name-i  OF  OCC 
(data-rome-1,  H  ))  LT  (data-name-i  CF  OCC 
(data-name-1,  Xl))  )  )  )  ) 
d)  INDEXED  BY  iadex-name-1 

This  subclause  is  used  to  specify  working  storage  internal  to  the 
program  and  doer  not  describe  data  stored  in  the  COBOL  record.  There¬ 
fore,  it  is  not  described  in  GDDL  specification  of  the  COBOL  record. 


f 
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6.  [  USAGE  IS 


DISPLAY 

COMPUTATIONAL 

COMPUTATIONAL-1 

COMPUTATIONAL-^ 

COMPUTATIONAL- 3 

INDEX 


V  J 


This  clause  is  used  to  epecify  data  type,  and  length.  It  can  be 
specified  at  field  or  group  level.  When  it  is  specified  at  group  level 
it  specifies  data  type  and  length  for  all  of  the  fields  in  the  group. 

a)  When  usage  is  DISPLAY  data  type  is  character  string  and 
length  is  specified  in  a  PICTURE  clause. 

b)  When  usage  is  CCMHJ NATIONAL  data  type  is  binary  number  and 
length  is  specified  in  a  PICTURE  clause. 

c)  When  usage  is  COMPUTATIONAL-!  data  type  is  binary,  floating 
point  number  4  bytes  in  length.  Tfoe  mantissa  has  a  length  of  24  bits. 
The  procedure  for  specifying  such  a  field  is  described  in  the  discussion 
of  fp-fomi-  under  the  PICTURE  clause. 

d)  When  usage  is  COMPUTATIONAL- 2  the  field  is  -he  same  as  a 
COMPUTATIONAL- I  field  except  that  the  length  is  8  bytes. 

e)  When  usage  is  COMPUTATIONAL- 3  data  type  is  decimal  number 
specified  by  '3ICTJRE  clause  where  the  character  set  is  packed  decimal 
(  4  bits  per  digit  ) .  The  character  code  is  specified  in  GDDL  in  the 
CHAR  statement. 

f)  Ueage  is  INDEXED  is  used  to  describe  working  storage  for 
the  program  and  is  therefore  not  specified  in  the  GDDL  description  of 


a  COBOL  record. 
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7-  [  JUSTIFIED  RIGHT  ] 

Normally  COBOL  character  strings  are  left  justified  with  trailing 
blanks.  This  clause  is  used  to  specify  that  the  character  string  in 
question  is  to  be  right  justified  with  leading  blanks.  This  is  speci¬ 
fied  in  GDDL  as  - 

FIELD  (  data-name-1,  ...  ;  V,  R,  CONSTANT  (  'E'  )  ...  ) 

8.  [  VALUE  IS  literal  ] 

This  clause  is  used  to  specify  that  a  field  is  set  to  a  new  value 
during  program  execution.  It  does  not  affect  the  stored  data,  and  thus 
is  not  specified  in  the  GDDL  description  of  the  COBOL  record. 


-  308  - 

2.  The  Proper  Containment  of  the  COBOL  Record.  Description  Features 
in  GDDL 

We  have  shown  that  the  GDDL  includes  every  COBOL  record  description 
feature.  Now,  we  show  that  there  are  record  level  features  which  are 
describable  in  GDDL  but  not  provided  by  COBOL.  Three  examples  are  given, 
each  one  highlighting  a  different  feature. 

Example  1.  The  Specification  of  a  Characteristic  in  Terms  of  Other 
Characteristics 

In  GDDL,  characteristics  of  records,  groups  and  fields 
can  be  specified  in  terms  of  other  characteristics.  For  example  a 
user  can  describe  in  GDDL  the  length  in  bits  of  a  field  x  as  equal  to 
the  number  of  times  another  field  Y  repeats.  This  is  specified  as  follows: 

FIELD  ('X',  B,  B,  COUNT  (  :Y'  ),  V,  ...  ) 

The  fourth  parameter  specifies  that  the  length  of  field  X  is  equal  to 
the  number  of  repetitions  of  Y. 

COBOL  provides  no  way  to  ma>.e  characteristics  dependent  on  other 
characteristics . 

Example  2.  Variable  Characteristics  and  the  Use  of  Delimiters 

In  GDDL,  record  level  characteristics  such  as  field  length  and 
repetition  number  can  be  described  which  vary  depending  on  the  data. 

In  such  cases,  delimiters  are  used  to  indicate  the  beginning  and/or 
end  of  the  fields  in  question.  For  example,  a  user  can  specify  that  the 
length  of  some  field  X  is  to  vary  with  the  character  $  used  as  a  delimiter. 
This  is  described  as  follows: 
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FIELD  ('X',  B,  B,  NOLIM,  V,  ... 

CONCODE  (  CONSTANT  (  $,  EBCDIC; ,  PHX  ), 

C'ONCODE  (  CONSTANT  {  $,  EBCDIC),  PTX  )  ) 

The  final  two  parameters  specify  that  $  is  to  be  used  as  a  beginning 
and  end  delimiter  for  values  of  the  field  X. 

COBOL  provides  no  delimiter  feature. 

Example  3-  IAe  Specification  of  Bit  Fields 

In  GDDL,  field  lengths  can  be  specified  in  terms  of  any  character 
code  or  in  bits.  Thus,  a  user  can  describe  the  length  of  a  field  X  to 
be  a  single  bit.  This  is  specified  as  follows: 

FIELD  (X,  B,  B,  1,  F,  ...  ) 

The  third  parameter  specifies  that  length  is  given  in  bits  and  the  fourth 
parameter  specifies  that  the  length  is  1. 

COBOL  allows  length  to  be  specified  in  characters  only. 

3*  Conclusion 

In  section  1  we  show  that  GDDL  includes  every  COBOL  record 
description  features.  In  the  section  2  we  saw  that  COBOL  does  not  include 
all  of  the  GDDL  record  description  features.  Hiur.,  we  may  conclude  that 
record  level  GDDL  properly  contains  the  COBOL  record  description  features. 


